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Message  from  the  Chairmen 


Dear  Delegates, 

We  take  great  pleasure  in  welcoming  you  to  Singapore  and  to  the  11 ,h  IEEE  Signal 
Processing  Society  workshop  on  Statistical  Signal  Processing  (SSP).  This  is  a  new 
experience  for  the  IEEE  Signal  Processing  Society  as  it  represents  the  new  series  of 
workshops  that  will  be  organized  biennially,  taking  over  from  the  Statistical  Signal  and 
Array  Processing(SSAPs);  and  the  first  occasion  that  the  Society  has  held  one  of  its 
international  workshops  in  Singapore. 

This  Workshop  series  brings  together  leading  authorities  in  the  field  and  experts  from 
keynote  industries  from  all  over  the  world;  and  offers  an  excellent  vehicle  for  the 
exchange  of  latest  research  results,  and  the  identification  of  emerging  trends  and  future 
directions.  To  this  end,  spurred  by  the  enviable  successes  of  previous  SSAPs,  we  are 
committed  to  providing  the  best  possible  platform  based  on  an  exciting  technical  and 
social  program. 

Our  Call  for  Papers  was  met  with  overwhelming  response.  To  maintain  the  collegial  and 
convivial  atmosphere  of  the  Workshop,  of  necessity,  the  number  of  technical 
presentations  had  to  be  limited.  Five  plenary  lectures  will  be  presented  at  the  Workshop, 
that  in  itself  speaks  volumes  for  the  tremendous  growth  in  statistical  signal  processing.  In 
addition,  we  hope  that  the  attendant  all-poster  sessions  will  create  the  vibrant  exchanges 
much  needed  in  a  workshop  setting. 

It  is  our  sincere  wish  that  you  make  the  most  out  of  the  opportunities  that  await  you  at 
SSP2001.  We  are  certain  that  the  gathering  of  among  the  world’s  best,  in  itself,  serves  as 
an  impetus  for  you  to  further  enrich  each  other’s  knowledge  and  experience. 

It  has  been  a  fascinating  and  invaluable  experience  for  us  in  organizing  this  Workshop. 
We  would  like  to  express  our  gratitude  to  our  financial  supporters  whose  contributions 
have  helped  immensely  to  enhance  the  quality  of  the  Workshop. 

Thank  you  for  your  tremendous  support.  We  hope  that  you  would  make  time  to  enjoy 
the  food,  fun  and  excitement  of  Singapore.  We  wish  you  a  most  intellectually  stimulating 
and  socially  fulfilling  Workshop. 

Here’s  to  a  Workshop  that  spells  success  in  every  way  and  provides  an  avenue  for  even 
greater  things  to  come. 


Tariq  S  Durrani 


A.  Rahim  Leyman 


Message  from  the  Technical  Chairman 

Dear  Delegates 

What  was  formerly  known  as  the  IEEE  Workshop  on  Statistical  Signal  and  Array  Processing 
(SSAP)  has  taken  on  a  new  term.  It  is  now  called  IEEE  Workshop  on  Statistical  Signal 
Processing. 

The  11th  IEEE  Workshop  on  Statistical  Signal  Processing  (SSP)  is  a  forum  for  engineers, 
mathematicians  and  scientists  to  present  and  discuss  issues  ranging  from  theoretical  and 
methodological  developments  to  practical  applications  of  statistical  signal  processing. 

It  thus  offers  an  excellent  opportunity  for  you  to  network  and  keep  up  with  your 
counterparts  from  the  world  over.  It  therefore  gives  me  great  pleasure  to  welcome  you  to 
such  a  prestigious  event. 

This  Workshop  offers  an  outstanding  technical  program,  covering  diverse  areas  of  statistical 
signal  processing,  stretching  from  wireless  communications  to  biomedical  signal  processing. 
The  technical  programme  comprises  five  plenary  talks  and  150  papers  presented  as  posters 
in  19  technical  sessions. 

The  response  to  the  Call  for  Papers  was  overwhelming;  we  received  over  200  submissions  of 
the  highest  quality.  All  submissions  underwent  a  thorough  review  by  at  least  two  members  of 
the  technical  committee  and/or  other  distinguished  members  of  the  statistical  signal 
processing  community.  I  wish  to  thank  all  reviewers  for  their  tremendous  effort  and  timely 
responses.  To  maintain  a  workshop  atmosphere,  as  well  as  a  technical  program  of  the 
highest  quality,  we  accommodated  64  percent  of  the  regular  submissions. 

We  have  included  6  special  sessions  in  the  technical  programme.  These  comprise  regular  as 
well  as  invited  papers.  I  wish  to  thank  the  special  sessions  organizers  for  putting  much  effort 
in  securing  highly  qualified  speakers  for  their  sessions. 

A  lot  of  effort  has  been  put  in  by  many  quarters  to  ensure  that  this  event  runs  smoothly.  I 
would  like  to  thank  each  and  everyone  for  their  tireless  work.  I  would  like  to  especially 
extend  my  gratitude  to  the  plenary  speakers  for  agreeing  to  participate  and  share  their 
knowledge  and  expertise  with  the  statistical  signal  processing  community. 

I  am  certain  that  you  will  find  the  technical  program  interesting  and  inspiring.  Aside  from 
serious  work,  I  also  hope  that  many  of  you  will  be  able  to  take  advantage  of  the  many 
sightseeing  and  gastronomic  opportunities  Singapore  has  to  offer. 


A.M.  Zoubir 
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ABSTRACT 


Aspects  of  Contemporary  Statistical  Methods 

Peter  Hall 

Australian  National  University 

Commenting  on  the  development  of  statistics  early  in  the  20th  century,  the  UCLA  historian 
Theodore  Porter  wrote  that  "the  foundations  of  mathematical  statistics  were  laid  between 
1890  and  1930",  and  argued  that  "the  principal  families  of  techniques  for  analyzing 
numerical  data  were  established  during  the  same  period."  There  was  of  course  a  revolution 
in  quantitative  data  analysis  in  the  early  part  of  last  century,  leading  to  the  development  of 
the  subject  we  know  today  as  Statistics.  And  at  the  time  Porter  wrote,  15  years  ago,  he 
would  also  have  been  correct  in  his  second  assertion.  However,  it  would  be  difficult  to 
justify  the  same  remarks  today.  The  speed  and  memory  of  computers  have  increased  one 
thousand  fold  since  1986,  and  the  second  revolution  in  statistics,  certainly  motivated  and 
perhaps  driven  by  developments  in  computing,  has  begun  to  fundamentally  change 
statistical  methodology.  It  is  a  long  way  from  running  its  course.  Over  the  next  few  decades 
it  will  transform  the  subject  into  something  that  is  quite  different,  in  terms  of  its  range  and 
the  emphases  on  types  of  problems  that  it  treats,  from  that  which  we  know  today.  If  the 
development  of  statistics  had  taken  place  in  the  environment  of  contemporary  advances  in 
computing  then  the  subject  would  most  likely  be  less  mathematical,  and  more  of  an 
experimental  science,  then  it  is  today.  The  talk  will  discuss  some  of  the  changes,  in  areas  of 
resampling  and  Monte  Carlo  methods,  and  outline  new  directions  for  at  least  the  near 
future. 
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ABSTRACT 


A  Geometric  and  Multiresolution  Analysis  Approach  to 

Robust  Detection 

Jose  M.  F.  Moura; 

Carnegie  Mellon  University,  USA 

Detection  algorithms  whose  design  takes  into  account  prior  knowledge  about  the  signals 
and  the  channel  face  a  quandary:  they  provide  marked  improvement  in  performance 
when  the  field  operating  conditions  match  well  this  available  knowledge;  but  they 
experience  strong  degradation  when  the  actual  conditions  depart  from  the  assumed  ones. 
In  other  words,  high  resolution  and  robustness  are  commonly  at  odds.  A  third  important 
variable  affecting  this  tradeoff  is  the  computational  complexity  of  the  solution.  I  will 
describe  a  geometric  based  approach  to  designing  detectors  that  leads  to  a  satisfying 
compromise:  simple  to  implement  detectors  that  are  robust  to  mismatches  and  that 
exhibit  good  performance.  The  approach  designs  a  representation  subspace  that  is  a  good 
approximation  (in  the  gap  metric  sense)  to  the  signal  set  (a  priori  information),  and  uses 
multiresolution  and  wavelet  analysis  to  design  the  representation  subspace  and 
implement  the  detector.  I  will  illustrate  the  approach  with  multipath  channels,  and 
present  detection  results  that  illustrate  the  robustness  of  the  geometric  gap  detector. 
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ABSTRACT 


Entropy,  Complexity  and  Chaos  in  Brain  Rhythms 

Nitish  V.  Thakor 

The  Johns  Hopkins  School  of  Medicine  Baltimore,  USA 

The  classical  approaches  to  analysis  and  interpretation  of  the  brain  rhythm,  namely  the 
EEG,  are  to  employ  non-parametric  or  parametric  signal  processing  methods.  These  linear 
systems  approaches  to  brain  rhythm  analysis  have  now  given  way  to  more  advanced 
methodologies.  These  methods  recogni2e  that  the  brain  rhythms  are  non-stationary  and 
brain's  responses  to  stimuli  are  non-linear.  While  spectral  analysis  has  proved  its  value  in 
sleep  staging  analysis,  higher  order  spectral  analysis  has  been  useful  in  determining  depth  of 
anesthesia.  Complexity  analysis  has  been  shown  to  discriminate  neurological  disorders  such 
as  schizophrenia.  Chaotic  dynamics  have  been  observed  in  brain  rhythms  preceding  or 
resulting  from  epileptic  seizures.  The  concepts  derived  from  information  theory,  including 
measures  of  entropy,  have  been  useful  in  characterizing  brain  injury.  Advanced  signal 
processing  has  long  been  of  interest  in  application  areas  such  as  diagnosis  of  brain 
disorders,  epilepsy,  sleep  or  anesthesia  analysis,  and  more  recently  in  brain-computer 
interfaces.  An  emerging  application  being  developed  by  our  group  is  monitoring  brain's 
rhythm  after  neurological  trauma  or  injury.  Advanced  quantitative  analysis,  based  on  the 
information  and  entropy  analysis  methods,  has  been  used  by  our  group  to  distinguish  and 
characterize  the  injury  response.  This  presentation  will  review  the  state  of  the  art  of  brain 
rhythm  analysis  using  the  emerging  signal  processing  methods  and  will  especially  help 
theoreticians  targeting  emergent,  significant  biomedical  applications. 
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Abstract 

Global  Navigation  Satellite  System  carrier  phase  ambiguity  reso¬ 
lution  is  the  key  to  high  precision  positioning  and  navigation.  In 
this  contribution  a  brief  review  is  given  of  the  probabilistic  theory 
of  integer  carrier  phase  ambiguity  estimation.  Various  ambiguity 
estimators  are  discussed.  Among  them  are  the  estimators  of  inte¬ 
ger  rounding,  integer  bootstrapping,  integer  least-squares  and  the 
Bayesian  solution.  We  also  discuss  the  various  relationships  that 
exist  between  these  estimators. 


1.  INTRODUCTION 

Global  Navigation  Satellite  System  (GNSS)  ambiguity  resolution 
is  the  process  of  resolving  the  unknown  cycle  ambiguities  of  dou¬ 
ble  difference  (DD)  carrier  phase  data  as  integers.  The  sole  pur¬ 
pose  of  ambiguity  resolution  is  to  use  the  integer  ambiguity  con¬ 
straints  as  a  means  of  improving  significantly  on  the  precision 
of  the  remaining  model  parameters,  such  as  baseline  coordinates 
and/or  atmospheric  (troposphere,  ionosphere)  delays. 

Ambiguity  resolution  applies  to  a  great  variety  of  current  and 
future  GNSS  models.  These  models  may  differ  greatly  in  com¬ 
plexity  and  diversity.  They  range  from  single-baseline  models  used 
for  kinematic  positioning  to  multi-baseline  models  used  as  a  tool 
for  studying  geodynamic  phenomena.  The  models  may  or  may  not 
have  the  relative  receiver-satellite  geometry  included.  They  may 
also  be  discriminated  as  to  whether  the  slave  receiver(s)  are  sta¬ 
tionary  or  in  motion.  When  in  motion,  one  solves  for  one  or  more 
trajectories,  since  with  the  receiver-satellite  geometry  included, 
one  will  have  new  coordinate  unknowns  for  each  epoch.  One  may 
also  discriminate  between  the  models  as  to  whether  or  not  the  dif¬ 
ferential  atmospheric  delays  (ionosphere  and  troposphere)  are  in¬ 
cluded  as  unknowns.  In  the  case  of  sufficiently  short  baselines  they 
are  usually  excluded. 

Apart  from  the  current  Global  Positioning  System  (GPS)  mod¬ 
els,  carrier  phase  ambiguity  resolution  also  applies  to  the  future 
modernized  GPS  and  the  future  European  Galileo  GNSS.  An  over¬ 
view  of  GNSS  models,  together  with  their  applications  in  survey¬ 
ing,  navigation,  geodesy  and  geophysics,  can  be  found  in  text¬ 


books  such  as  [ Hofmann-Wellenhof  et  al.,  1997],  \Leick,  1995], 

[ Parkinson  and  Spilker,  1996],  [ Strang  and  Borre,  1997]  and  [Te¬ 
unissen  and  Kleusberg , 

1998]. 

In  this  contribution  we  review  the  probabilistic  theory  for  in¬ 
teger  carrier  phase  ambiguity  estimation.  It  is  the  key  to  high  pre¬ 
cision  GNSS  positioning  and  navigation.  This  contribution  is  or¬ 
ganized  as  follows.  In  section  2  we  introduce  a  general  class  of  in¬ 
teger  ambiguity  estimators,  determine  their  probability  mass  func¬ 
tions  and  show  how  their  variability  affect  the  uncertainty  in  the 
computed  GNSS  baselines.  This  theory  is  worked  out  in  sections 
3  and  4  for  two  of  the  most  important  integer  ambiguity  estimators. 
In  section  3  we  discuss  the  properties  of  integer  bootstrapping  and 
in  section  4  those  of  integer  least-squares.  In  the  final  section, 
section  5,  we  discuss  the  Bayesian  solution  to  carrier  phase  ambi¬ 
guity  resolution.  Although  the  Bayesian  approach  has  not  yet  find 
a  wide-spread  use  in  any  of  the  GNSS  applications,  the  basic  con¬ 
cepts  involved  are  of  interest  in  their  own  right.  Where  possible, 
the  various  ambiguity  estimation  principles  are  compared. 

2.  INTEGER  AMBIGUITY  RESOLUTION 
2.1.  The  GNSS  model 

As  our  point  of  departure  we  will  take  the  following  system  of 
linear(ized)  observation  equations 

y  =  Aa  +  Bb  +  e  (1) 

where  y  is  the  given  GNSS  data  vector  of  order  m,  a  and  b  are 
the  unknown  parameter  vectors  respectively  of  order  n  and  p,  and 
where  e  is  the  noise  vector.  In  principle  all  the  GNSS  models  can 
be  cast  in  this  frame  of  observation  equations.  The  data  vector  y 
will  usually  consist  of  the  ’observed  minus  computed’  single-  or 
dual-  frequency  double-difference  (DD)  phase  and/or  pseudorange 
(code)  observations  accumulated  over  all  observation  epochs.  The 
entries  of  vector  a  are  then  the  DD  carrier  phase  ambiguities,  ex¬ 
pressed  in  units  of  cycles  rather  than  range.  They  are  known  to 
be  integers,  a  €  Z".  The  entries  of  the  vector  b  will  consist  of 
the  remaining  unknown  parameters,  such  as  for  instance  baseline 
components  (coordinates)  and  possibly  atmospheric  delay  parame¬ 
ters  (troposphere,  ionosphere).  They  are  known  to  be  real-  valued, 
beRP. 

The  procedure  which  is  usually  followed  for  solving  the  GNSS 
model  (1),  can  be  divided  into  three  steps.  In  the  first  step  one 
simply  disregards  the  integer  constraints  a  £  7?  on  the  ambiguities 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


4 


and  performs  a  standard  least-squares  adjustment.  As  a  result  one 
obtains  the  (real-valued)  estimates  of  a  and  b,  together  with  their 
variance-covariance  (vc-)  matrix 


(2) 


This  solution  is  referred  to  as  the  ’float’  solution.  In  the  second 
step  the  ’float’  ambiguity  estimate  a  is  used  to  compute  the  corre¬ 
sponding  integer  ambiguity  estimate  a.  This  implies  that  a  map¬ 
ping  5  :  Rn  Z",  from  the  n-dimensional  space  of  reals  to  the 
n-dimensional  space  of  integers,  is  introduced  such  that 


d  =  S(a)  (3) 

Once  the  integer  ambiguities  are  computed,  they  are  used  in  the 
third  step  to  finally  correct  the  ’float’  estimate  of  b.  As  a  result  one 
obtains  the  ’fixed’  solution 

b  =  b-Q-h.Q7l(a-a)  (4) 

In  the  present  review  we  will  primarily  focus  our  attention  on  the 
probabilistic  properties  of  (3)  and  (4). 

2.2.  Admissible  integer  estimation 

There  are  many  ways  of  computing  an  integer  ambiguity  vector  a 
from  its  real-valued  counterpart  a.  To  each  such  method  belongs  a 
mapping  S  :  R"  Z"  from  the  n-dimensional  space  of  real  num¬ 
bers  to  the  n-dimensional  space  of  integers.  Due  to  the  discrete 
nature  of  Z",  the  map  S  will  not  be  one-to-one,  but  instead  a  many- 
to-one  map.  This  implies  that  different  real-valued  ambiguity  vec¬ 
tors  will  be  mapped  to  the  same  integer  vector.  One  can  therefore 
assign  a  subset  S,  C  Rn  to  each  integer  vector  z  €  Zn: 

s2  =  {^e«nU  =  ‘S(x)},  zez"  (5) 

The  subset  Sz  contains  all  real- valued  ambiguity  vectors  that  will 
be  mapped  by  S  to  the  same  integer  vector  z  €  Z".  This  subset 
is  referred  to  as  the  pull-in  region  of  z  [ Jonkman ,  1998].  It  is  the 
region  in  which  all  ambiguity  ’float’  solutions  are  pulled  to  the 
same  ’fixed’  ambiguity  vector  z.  Using  the  pull-in  regions,  one  can 
give  an  explicit  expression  for  the  corresponding  integer  ambiguity 
estimator.  It  reads 

«  =  X  &z(A)  (6) 

z£Z" 

with  the  indicator  function 


/  _  f  1  if  a  £  Sz 

sz\a)  -  |  o  otherwise 

Since  the  pull-in  regions  define  the  integer  estimator  completely, 
one  can  define  classes  of  integer  estimators  by  imposing  various 
conditions  on  the  pull-in  regions.  One  such  class  is  referred  to  as 
the  class  of  admissible  integer  estimators.  These  integer  estimators 
are  defined  as  follows. 

Definition  1 

The  integer  estimator  a  =  Yzpzn  &z{£)  is  said  to  be  admissible  if 

(0 

(»)  int(sZl ) nM«a )  =  0,  vZl  /t2ez" 

(mi)  Sz  =  z  +  So,  Vz  £  Zn 


This  definition  is  motivated  as  follows.  Each  one  of  the  above  three 
conditions  describe  a  property  of  which  it  seems  reasonable  that  it 
is  possessed  by  an  arbitrary  integer  ambiguity  estimator.  The  first 
condition  states  that  the  pull-in  regions  should  not  leave  any  gaps 
and  the  second  that  they  should  not  overlap.  The  absence  of  gaps 
is  needed  in  order  to  be  able  to  map  any  ’float’  solution  a  £  TP 
to  Z",  while  the  absence  of  overlaps  is  needed  to  guarantee  that 
the  ’float’  solution  is  mapped  to  just  one  integer  vector.  Note  that 
we  allow  the  pull-in  regions  to  have  common  boundaries.  This 
is  permitted  if  we  assume  to  have  zero  probability  that  a  lies  on 
one  of  the  boundaries.  This  will  be  the  case  when  the  probability 
density  function  (pdf)  of  a  is  continuous. 

The  third  and  last  condition  follows  from  the  requirement  that 
S(x  +  z)  =  S(x)  +  z,  Vx  £  Rn,z  £  Z".  Also  this  condition  is  a  rea¬ 
sonable  one  to  ask  for.  It  states  that  when  the  ’float’  solution  is  per¬ 
turbed  by  z  €  Z",  the  corresponding  integer  solution  is  perturbed 
by  the  same  amount.  This  property  allows  one  to  apply  the  integer 
remove-restore  technique:  S(a  —  z)  +z  =  S(a).  It  therefore  allows 
one  to  work  with  the  fractional  parts  of  the  entries  of  a,  instead  of 
with  its  complete  entries. 

With  the  division  of  Rn  into  mutually  exclusive  pull-in  re¬ 
gions,  we  are  now  in  the  position  to  consider  the  distribution  of 
a.  This  distribution  is  of  the  discrete  type  and  it  will  be  denoted 
as  P(a  =  z).  It  is  a  probability  mass  function,  having  zero  masses 
at  nongrid  points  and  nonzero  masses  at  some  or  all  grid  points. 
If  we  denote  the  continuous  probability  density  function  of  a  as 
Pa(x),  the  distribution  of  a  follows  as 


P(d  =  z)  =  [  pa(x)dx  ,  z  6  Z"  (7) 

JS' 

This  expression  holds  for  any  distribution  the  ’float’  ambiguities  a 
might  have.  In  most  GNSS  applications  however,  one  assumes  the 
vector  of  observables  y  to  be  normally  distributed.  The  estimator 
a  is  therefore  normally  distributed  too,  with  mean  a  £  Z"  and  vc- 
matrix  Qa.  Its  probability  density  function  reads 


Pa  (*)  = 


1 

v/det(6a)(27t)5" 


(8) 


with  the  squared  weighted  norm  ||  .  \\q.=  {)T  Qa  '(.)•  Note  that 
P(a  =  a)  equals  the  probability  of  correct  integer  ambiguity  esti¬ 
mation.  It  describes  the  expected  success  rate  of  GNSS  ambiguity 
resolution. 


23.  The  baseline  solution 


We  are  now  in  the  position  to  determine  the  pdf  of  the  ’fixed’  base¬ 
line  estimator  (4).  In  order  to  determine  this  pdf,  one  needs  to 
propagate  the  uncertainty  of  the  ’float’  solution,  a  and  b,  as  well  as 
the  uncertainty  of  the  integer  solution  a  through  (4).  Should  one 
neglect  the  random  character  of  the  integer  solution  and  therefore 
consider  the  ambiguity  vector  a  as  deterministic  and  equal  to,  say, 
z,  then  the  pdf  of  b  would  equal  the  conditional  baseline  distribu¬ 
tion 


exp{-i  I! x—b(z)  \\2Qms} 
^detQh\a(2n)'2p 


(9) 


with  conditional  mean  b(z)  =  b  —  1  (a — z),  conditional  vari¬ 


ance  matrix  Qm,  =  fig  -  QhaQalQal  and  ||  .  ||^= 

However,  since  a  is  random  and  not  deterministic,  the  conditional 
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baseline  distribution  will  give  a  too  optimistic  description  of  the 
quality  of  the  ’fixed’  baseline.  To  get  a  correct  description  of  the 
’fixed’  baseline’s  pdf,  the  integer  ambiguity’s  pmf  needs  to  be  con¬ 
sidered.  As  the  following  theorem  shows  this  results  in  a  baseline 
distribution,  which  generally  will  be  multi-modal. 

Theorem  1  (Pdf  of  the  ’fixed’  baseline) 

Let  the  ’float’  solution,  a  and  b,  be  normally  distributed  with  mean 
a  G  Z"  and  mean  b  £  Rp,  and  vc-matrix  (2),  let  a  be  an  admissible 
integer  estimator  and  let  the  ’fixed’  baseline  b  be  given  as  in  (4). 
The  pdf  of  b  reads  then 

Pb{x)  =  X  PbJx  I  z)p (“  =  *)  (10) 

zez» 


Note  that,  although  the  model  (1)  is  linear  and  the  observables  nor¬ 
mally  distributed,  the  distribution  of  the  ’fixed’  baseline  is  not  nor¬ 
mal,  but  multi-modal.  As  the  theorem  shows,  the  ’fixed’  baseline 
distribution  equals  an  infinite  sum  of  weighted  conditional  baseline 
distributions.  These  conditional  baseline  distributions  pb |-(*  |  z) 
are  shifted  versions  of  one  another.  The  size  and  direction  of  the 
shift  is  governed  by  Qb- Qf 1  z,  z  G  Z".  Each  of  the  conditional 
baseline  distributions  in  the  infinite  sum  is  downweighted.  These 
weights  are  given  by  the  probability  masses  of  the  distribution  of 
the  integer  bootstrapped  ambiguity  estimator  a.  This  shows  that 
the  dependence  of  the  ’fixed’  baseline  distribution  on  the  choice  of 
integer  estimator  is  only  felt  through  the  weights  P(a  =  z). 


2.4.  On  the  quality  of  the  ’fixed’  baseline 

In  order  to  describe  the  quality  of  the  ’fixed’  baseline,  one  would 
like  to  know  how  close  one  can  expect  the  baseline  estimate  b  to 
be  to  the  unknown,  but  true  baseline  value  b.  As  a  measure  of 
confidence,  we  take 


P(b£R)  =  J  p-b(x)dx  with  R  C  Rp  (11) 


But  in  order  to  evaluate  this  integral,  we  first  need  to  make  a  choice 
about  the  shape  and  location  of  the  subset  R.  Since  it  is  common 
practice  in  GNSS  positioning  to  use  the  vc-matrix  of  the  condi¬ 
tional  baseline  estimator  as  a  measure  of  precision  for  the  ’fixed’ 
baseline,  the  vc-matrix  Qy&  will  be  used  to  define  the  shape  of  the 
confidence  region.  For  its  location,  we  choose  the  confidence  re¬ 
gion  to  be  centered  at  b.  After  all,  we  would  like  to  know  by  how 
much  the  baseline  estimate  b  can  be  expected  to  differ  from  the 
true,  but  unknown  baseline  value  b.  That  is,  one  would  like  (1 1)  to 
be  a  measure  of  the  baseline’s  probability  of  concentration  about 
b. 

With  these  choices  on  shape  and  location,  the  region  R  takes 
the  form 


R={x£RP\(x-b)TQ^s(x-b)  <  P2}  (12) 


p  degrees  of  freedom  and  noncentrality  parameter  Aj.  Then 

P(b  £  R)  =  X  P(X2(PX)  <  PV(«  =  z)  03) 

z£Z" 

with 

K  =11  Vb,  \\2Qh,.  and  Vbz  =  Qh&QTx(z-a) 


This  result  shows  that  the  probability  of  the  ’fixed’  baseline  lying 
inside  the  ellipsoidal  region  R  centered  at  b  equals  an  infinite  sum 
of  probability  products.  If  one  considers  the  two  probabilities  of 
these  products  separately,  two  effects  are  observed.  First  the  prob¬ 
abilistic  effect  of  shifting  the  conditional  baseline  estimator  away 
from  b  and  secondly  the  probabilistic  effect  of  the  peakedness  or 
nonpeakedness  of  the  ambiguity  pmf.  The  second  effect  is  related 
to  the  expected  performance  of  ambiguity  resolution,  while  the 
first  effect  has  to  do  with  the  sensitivity  of  the  baseline  for  changes 
in  the  values  of  the  integer  ambiguities.  This  effect  is  measured 
by  the  noncentrality  parameter  Xl.  Since  the  tail  of  a  noncentral 
Chi-square  distribution  becomes  heavier  when  the  noncentrality 
parameter  increases,  while  the  degrees  of  freedom  remain  fixed, 
p(x2{pX)  <  p2)  gets  smaller  when  A-  gets  larger. 

The  two  probabilities  in  the  product  reach  their  maximum  val¬ 
ues  when  z  =  a.  The  following  corollary  shows  how  these  two 
maxima  can  be  used  to  lower  bound  and  to  upper  bound  the  prob¬ 
ability  P(b  £  R).  Such  bounds  are  of  importance  for  practical  pur¬ 
poses,  since  it  is  difficult  in  general  to  evaluate  (13)  exactly. 

Corollary  1  (Lower  and  upper  bounds) 

Let  b  be  the  ’fixed’  baseline  estimator  and  let  R  be  defined  as  in 
(12).  Then 

P(b^a  €  R)P(d  =  a)<  P(b  e  R)  <  P(b\a=a  €  R)  04) 


with 

P(blb=o€R)  =  P(x2(p,0)<P2) 


Note  that  the  two  bounds  relate  the  probability  of  the  ’fixed’  base¬ 
line  estimator  to  that  of  the  conditional  estimator  and  the  boot¬ 
strapped  succes  rate.  The  above  bounds  become  tight  when  the 
ambiguity  success  rate  approaches  one.  This  shows,  although  the 
probability  of  the  conditional  estimator  always  overestimates  the 
probability  of  the  ’fixed’  baseline  estimator,  that  the  two  probabil¬ 
ities  are  close  for  large  values  of  the  success  rate.  This  implies  that 
in  case  of  GNSS  ambiguity  resolution,  one  should  first  evaluate  the 
success  rate  P(a  =  a)  and  make  sure  that  its  value  is  close  enough 
to  one,  before  making  any  inferences  on  the  basis  of  the  distri¬ 
bution  of  the  conditional  baseline  estimator.  In  other  words,  the 
(unimodal)  distribution  of  the  conditional  estimator  is  a  good  ap¬ 
proximation  to  the  (multimodal)  distribution  of  the  bootstrapped 
baseline  estimator,  when  the  success  rate  is  sufficiently  close  to 
one. 


The  size  of  the  region  can  be  varied  by  varying  p.  The  follow¬ 
ing  theorem  shows  how  the  baseline’s  probability  of  concentration 
(11)  can  be  evaluated  as  a  weighted  sum  of  probabilities  of  non¬ 
central  Chi-square  distributions. 

Theorem  2  (The  ’fixed’  baseline’s  probability  of  concentration) 
Let  b  be  the  ’fixed’  baseline  estimator,  let  R  be  defined  as  in  (12), 
andlet%2(p,  Xz)  denote  the  noncentral  Chi-square  distribution  with 


3.  INTEGER  BOOTSTRAPPING 
3.1.  The  bootstrapped  estimator 

The  distributional  results  presented  so  far  hold  for  any  admissible 
ambiguity  estimator.  The  simplest  way  to  obtain  an  integer  vector 
from  the  real-valued  ’float’  solution  is  to  round  each  of  the  entries 
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of  a  to  its  nearest  integer.  The  corresponding  integer  estimator 
reads  therefore 

«/t  =  ([ai],--.,[«n])7’  (15) 

where  denotes  rounding  to  the  nearest  integer.  The  pull-in 
region  of  this  integer  estimator  equals  the  multivariate  version  of 
the  unit-square. 

Another  relatively  simple  integer  ambiguity  estimator  is  the 
bootstrapped  estimator.  The  bootstrapped  estimator  can  be  seen 
as  a  generalization  of  the  previous  estimator.  It  still  makes  use  of 
integer  rounding,  but  it  also  takes  some  of  the  correlation  between 
the  ambiguities  into  account.  The  bootstrapped  estimator  follows 
from  a  sequential  conditional  least-  squares  adjustment  and  it  is 
computed  as  follows.  If  n  ambiguities  are  available,  one  starts  with 
the  first  ambiguity  a i,  and  rounds  its  value  to  the  nearest  integer. 
Having  obtained  the  integer  value  of  this  first  ambiguity,  the  real¬ 
valued  estimates  of  all  remaining  ambiguities  are  then  corrected  by 
virtue  of  their  correlation  with  the  first  ambiguity.  Then  the  sec¬ 
ond,  but  now  corrected,  real-valued  ambiguity  estimate  is  rounded 
to  its  nearest  integer.  Having  obtained  the  integer  value  of  the 
second  ambiguity,  the  real-valued  estimates  of  all  remaining  n  —  2 
ambiguities  are  then  again  corrected,  but  now  by  virtue  of  their 
correlation  with  the  second  ambiguity.  This  process  is  continued 
until  all  ambiguities  are  considered.  We  thus  have  the  following 
definition. 

Definition  2  ( Integer  bootstrapping) 

Let  d  =  (dj , . . . , a„)T  €  R"  be  the  ambiguity  ’float’  solution  and 
let  aB  =  (dg  i , ....  aim)7  £  Z"  denote  the  corresponding  integer 
bootstrapped  solution.  The  entries  of  the  bootstrapped  ambiguity 
estimator  are  then  defined  as 

ob,  1  =  [<*i] 

Ob, 2  =  [«2|l]  =  [02 -G2lG~[2(dl 

:  (16) 

OB,n  [^n|Ar]  ~ 

[On  ~  1% J  ( Oj\j  -  dfl ,;)] 

where '[.]'  denotes  the  operation  of  rounding  to  the  nearest  integer, 
and  a,j|y  denotes  the  covariance  between  d,  and  a^j,  and  a2y  is 
the  variance  of  ap.  The  shorthand  notation  &nj  stands  for  the 
ith  least-squares  ambiguity  obtained  through  a  conditioning  on  the 
previous  /  =  {1, . . . ,  (z —  1)}  sequentially  rounded  ambiguities. 

Note  that  the  bootstrapped  estimator  is  not  unique.  Changing  the 
order  in  which  the  ambiguities  appear  in  vector  a  will  already 
produce  a  different  bootstrapped  estimator.  Although  the  princi¬ 
ple  of  bootstrapping  remains  the  same,  every  choice  of  ambiguity 
parametrization  has  its  own  bootstrapped  estimator. 

3.2.  The  bootstrapped  pull-in  regions 

The  pull-in  regions  for  rounding  are  unit-cubes  centred  at  integer 
grid  points.  For  bootstrapping  the  shape  of  the  pull-in  regions  will 
depend  on  the  vc-matrix  of  the  ambiguities.  They  will  coincide 
with  the  unit-  cubes  only  in  case  the  vc-matrix  is  a  diagonal  ma¬ 
trix.  Bootstrapping  reduces  namely  to  rounding  in  the  absence  of 
any  correlation  between  the  ambiguities.  The  following  theorem 
gives  a  description  of  the  bootstrapped  pull-in  regions  in  the  gen¬ 
eral  case. 

Theorem  3  ( Bootstrapped  pull-in  regions) 


The  pull-in  regions  of  the  bootstrapped  ambiguity  estimator  aB  — 
(aB  i ,aB,n)T  €  Z"  are  given  as 

Sb,z  =  {x£R"  \  \cjL~\x-z)  j  <  i,i=l-«)  (17) 

Vz  €  Z"  where  L  denotes  the  unique  unit  lower  triangular  matrix 
of  the  ambiguity  vc-matrix’  decomposition  Qa  =  LDLT  and  c,-  de¬ 
notes  the  ith  canonical  unit  vector  having  a  1  as  its  ith  entry  and 
zeros  otherwise. 

That  the  bootstrapped  estimator  is  indeed  admissible,  can  now  be 
seen  as  follows.  The  first  two  conditions  of  Definition  1  are  easily 
verified  using  the  definition  of  the  bootstrapped  estimator.  Since 
every  real-valued  vector  a  will  be  mapped  by  the  bootstrapped 
estimator  to  an  integer  vector,  the  pull-in  regions  SB  z  cover  Rn 
without  any  gaps.  There  is  also  no  overlap  between  the  pull-in  re¬ 
gions,  since  -  apart  from  boundary  ties  -  any  real-valued  vector  a 
is  mapped  to  not  more  than  one  integer  vector.  To  verify  the  last 
condition  of  Definition  1,  we  make  use  of  (17).  From 

SB,z  = 

{xeRn\\cfL-l(x-z)\<  j  ,  i  =  l,n}  = 
{x£Rn\\cjL-ly\<  \  ,x  =  y  +  z,i=\,n}  = 

■?b,o+z 

it  follows  that  all  bootstrapped  pull-in  regions  are  translated  copies 
of  SB ,o-  AH  pull-in  regions  have  therefore  the  same  shape  and  the 
same  volume.  Their  volumes  all  equal  1 .  This  can  be  shown  by 
transforming  SB  ()  to  the  unit  cube  centered  at  the  origin.  Consider 
the  linear  transformation  y  =  L~lx.  Then 

L~1(Sb,o)  =  {y  €  7?"  |  |  cfy  |  <  i,l=l,...,n} 

equals  the  unit  cube  centered  at  the  origin.  Since  the  determinant 
of  the  unit  lower  triangular  matrix  L~l  equals  one  and  since  the 
volume  of  the  unit  cube  equals  one,  it  follows  that  the  volume  of 
SB  fi  must  equal  one  as  well.  To  infer  the  shape  of  the  bootstrapped 
pull-in  region,  we  consider  the  two-dimensional  case  first.  Let  the 
lower  triangular  matrix  L  be  given  as 


Then 

SB, o  =  {xER2  |  |  cjL~lx  |  <  j  ,  z  =  1,2} 

=  {*e*2ll*tl<  |,l*2-/*i|<  \} 

which  shows  that  the  two-dimensional  pull-in  region  equals  a  par- 
allellogram.  Its  region  is  bounded  by  the  two  vertical  lines  xj  — 
1/2  and  x\  --  —1/2,  and  the  two  parallel  slopes  xi  =  lx\  +  1/2 
and  *2  =  lx t  —  1/2-  The  direction  of  the  slope  is  governed  by 
l  =  O21C7/2.  Hence,  in  the  absence  of  correlation  between  the 
two  ambiguities,  the  parallellogram  reduces  to  the  unit  square.  In 
higher  dimensions  the  above  construction  of  the  pull-in  region  can 
be  continued.  In  three  dimensions  for  instance,  the  intersection 
of  the  pull-in  region  with  the  X1X2 -plane  remains  a  parallellogram, 
while  along  the  third  axis  the  pull-in  region  becomes  bounded  by 
two  parallel  planes. 
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3.3.  The  bootstrapped  pmf 

Since  the  integer  bootstrapped  estimator  is  defined  as  as  =  z  <=> 
a  G  Ss,z,  it  follows  that  P(aB  =  z)  =  P(a  G  SBtZ).  The  pmf  of  as 
follows  therefore  as 

P{ae  =  z)=  f  Pa(x)dx ,  zGZ"  (18) 


Hence,  the  probability  that  as  coincides  with  z  is  given  by  the  inte¬ 
gral  of  the  pdf  pn(x)  over  the  bootstrapped  pull-in  region  Sr-  c  Rn  ■ 
The  above  expression  holds  for  any  distribution  the  ’float’  ambigu¬ 
ities  a  might  have.  In  most  GNSS  applications  however,  one  usu¬ 
ally  assumes  the  vector  of  observables  y  to  be  normally  distributed. 
For  that  case  the  following  theorem  gives  an  exact  expression  of 
the  bootstrapped  pmf. 

Theorem  4  (The  integer  bootstrapped  pmf) 

Let  a  be  distributed  as  N(a,  0a) ,  a  €  Z",  and  let  as  be  the  corre¬ 
sponding  integer  bootstrapped  estimator.  Then 


P(dB  =  z) 


nr=,l«*( 


1  -Vj(a-z) 

20a. ■  j 


+  <b( 


1+2  IJ(a-z), 
2OaiV 


- 1] ,  z  e  Z" 


(19) 


with 


i: 


and  with  /,•  the  ith  column  vector  of  the  unit  lower  triangular  matrix 
L~t  and  ct|  the  variance  of  the  ith  least-  squares  ambiguity  ob¬ 
tained  through  a  conditioning  on  the  previous  /  =  {  1 1 ) } 
ambiguities. 


The  bootstrapped  pmf  equals  a  product  of  univariate  pmf’s  and  is 
therefore  easy  to  compute.  Note  that  the  bootstrapped  pmf  is  com¬ 
pletely  governed  by  the  ambiguity  vc-matrix  0g.  The  pmf  follows 
once  the  triangular  factor  L  and  the  diagonal  matrix  D  of  the  de¬ 
composition  0a  =  LDLt  are  given.  The  above  result  also  shows 
that  the  bootstrapped  pmf  is  symmetric  about  the  mean  of  d.  This 
implies  that  the  bootstrapped  estimator  dg  is  an  unbiased  estimator 
of  a  G  Z".  Since  the  ’float’  solutions,  a  and  b,  are  unbiased  too, 
it  follows  from  taking  the  expectation  of  (4)  that  the  bootstrapped 
baseline  is  also  unbiased. 

For  the  purpose  of  predicting  the  success  of  ambiguity  reso¬ 
lution,  the  probability  of  correct  integer  estimation  is  of  particular 
interest.  For  the  bootstrapped  estimator  this  success  rate  is  given 
in  the  following  corollary. 


Corollary  2  (The  bootstrapped  success  rate) 

The  bootstrapped  probability  of  correct  integer  estimation  (the  suc¬ 
cess  rate)  is  given  as 


=  a>  =  Il[2«»(  — )  - 1]  (20) 

i=l  i 

The  method  of  integer  bootstrapping  is  easy  to  implement  and  it 
does  not  need,  as  opposed  to  the  method  of  integer  least-squares 
(see  next  section),  an  integer  search  for  computing  the  sought  for 
integer  solution.  However,  as  it  was  mentioned  earlier,  the  out¬ 
come  of  bootstrapping  depends  on  the  chosen  ambiguity  parametriza- 
tion.  Bootstrapping  of  DD  ambiguities,  for  instance,  will  produce 
an  integer  solution  which  generally  differs  from  the  integer  solu¬ 
tion  obtained  from  bootstrapping  of  reparametrized  ambiguities. 


Since  this  dependency  also  holds  true  for  the  bootstrapped  pmf, 
one  still  has  some  important  degrees  of  freedom  left  for  improving 
(20)  or  for  sharpening  the  lower  bound  of  (14). 

In  order  to  improve  the  bootstrapped  success  rate,  one  should 
work  with  decorrelated  ambiguities  instead  of  with  the  original 
ambiguities.  The  method  of  bootstrapping  performs  relatively  poor, 
for  instance,  when  applied  to  the  DD  ambiguities.  This  is  due  to 
the  usually  high  correlation  between  the  DD  ambiguities.  Boot¬ 
strapping  should  therefore  only  be  used  in  combination  with  the 
decorrelating  Z-transformation  of  the  LAMBDA  method  [Teunis- 
sen,  1993, 1995],  This  transformation  decorrelates  the  ambiguities 
further  than  the  best  reordering  would  achieve  and  thereby  reduces 
the  values  of  the  sequential  conditional  variances.  By  reducing 
the  values  of  the  sequential  conditional  variances,  the  bootstrapped 
success  rate  gets  enlarged. 

It  may  however  happen  that  it  is  simply  not  possible  to  resolve 
the  complete  vector  of  ambiguities  with  sufficient  probability.  As 
an  alternative  of  resolving  the  complete  vector  of  ambiguities,  one 
might  then  consider  resolving  only  a  subset  of  the  ambiguities. 
The  idea  of  partial  ambiguity  resolution  is  based  on  the  fact  that 
the  success  rate  will  generally  increase  when  fewer  integer  con¬ 
straints  are  imposed.  However,  in  order  to  apply  partial  ambiguity 
resolution,  one  first  will  have  to  determine  which  subset  of  am¬ 
biguities  to  choose.  It  will  be  clear  that  this  decision  should  be 
based  on  the  precision  of  the  ’float’  ambiguities.  The  more  precise 
the  ambiguities,  the  larger  the  ambiguity  success  rate.  It  is  at  this 
point  where  the  decorrelation  step  of  the  LAMBDA  method  and 
the  bootstrapping  principle  can  be  applied.  Once  the  transformed 
and  decorrelated  ambiguity  vc-matrix  is  obtained,  the  construction 
of  the  subset  proceeds  in  a  sequential  fashion.  One  first  starts  with 
the  most  precise  ambiguity,  say  i\ ,  and  computes  its  success  rate 
P(d\  =  z\ ).  If  this  success  rate  is  large  enough,  one  continues  and 
determines  the  most  precise  pair  of  ambiguities,  say  (£1,22).  If 
their  success  rate  is  still  large  enough,  one  continues  again  by  try¬ 
ing  to  extend  the  set.  This  procedure  continues  until  one  reaches 
a  point  where  the  corresponding  success  rate  becomes  unaccept¬ 
ably  small.  When  this  point  is  reached,  one  can  expect  that  the 
previously  identified  ambiguities  can  be  resolved  successfully. 

Once  the  subset  for  partial  ambiguity  resolution  has  been  iden¬ 
tified,  one  still  needs  to  determine  what  this  will  do  to  improve  the 
baseline  estimator.  After  all,  being  able  to  successfully  resolve  the 
ambiguities  does  not  necessarily  mean  that  the  ’fixed’  solution  is 
significantly  better  than  the  ’float’  solution.  The  theory  presented 
in  the  previous  sections  provide  the  necessary  tools  for  performing 
such  an  evaluation. 

4.  INTEGER  LEAST-SQUARES 
4.1.  The  ILS  estimator 

In  this  section  we  review  some  integer  least-squares’  theory  for 
solving  the  GNSS  model  (1).  When  using  the  least-squares  princi¬ 
ple,  the  GNSS  model  can  be  solved  by  means  of  the  minimization 
problem 

min  ||  y  —  Aa  —  Bb  [|g  ,aeZn,b€R *  (21) 

a.b  y 

with  Qy  the  vc-matrix  of  the  GNSS  observables.  This  type  of  least- 
squares  problem  was  first  introduced  in  [Teunissen,  1993]  and  has 
been  coined  with  the  term  ’integer  least-squares’ .  It  is  a  nonstan¬ 
dard  least-squares  problem  due  to  the  integer  constraints  a  G  Z". 
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The  solution  of  (21)  is  consistent  with  the  three  solution  steps  of 
section  1.  This  can  be  seen  as  follows.  It  follows  from  the  orthog¬ 
onal  decomposition 


y—Aa  —  Bb 


1 1 2 

He, 


=  II  file,  +  11 


(22) 


with  e  =  y  -Aa-Bb  and  b(a)  =  b  —  (a  —  a),  that  the 

sought  for  minimum  is  obtained  when  the  second  term  on  the 
right-hand  side  is  minimized  for  a  G  Z"  and  the  last  term  is  set  to 
zero.  The  integer  least-squares  (ILS)  estimator  of  the  ambiguities 
is  therefore  defined  as  follows. 


Definition  3  (Integer  least-squares) 

Leta=  (dj, . . .  ,an)T  <E  Rn  be  the  ambiguity  ’float’  solution  and  let 
&ls  £  Zn  denote  the  corresponding  integer  least-squares  solution. 
Then 

&ls  =  argmm  ||  a  —  z  ||g,  (23) 


In  contrast  to  integer  rounding  and  integer  bootstrapping,  an  in¬ 
teger  search  is  needed  to  compute  a^.  Although  we  will  refrain 
from  discussing  the  computational  intricacies  of  ELS  estimation, 
the  conceptual  steps  of  the  computational  procedure  will  be  de¬ 
scribed  briefly.  The  ILS  procedure  is  mechanized  in  the  GNSS 
LAMBDA  (Least-squares  AMBiguity  Decorrelation  Adjustment) 
method,  which  is  currently  one  of  the  most  applied  methods  for 
GNSS  carrier  phase  ambiguity  resolution.  For  more  information 
on  the  LAMBDA  method,  we  refer  to  e.g.  [Teunissen,  1993],  [Te- 
unissen,  1995]  and  [de  Jonge  and  Tiberius,  1996a]  or  to  the  text¬ 
books  [Hofmann-Wellenhof  1997],  [Strang  and  Borre,  1997],  [Te¬ 
unissen  and  Kleusberg,  1998],  Practical  results  obtained  with  it 
can  be  found,  for  example,  in  [ Boon  and  Ambrosius,  1997],  [ Boon 
et  al.,  1997],  [Cox  and  Brading,  1999],  [de  Jonge  and  Tiberius, 
1996b],  [de  Jonge  et  al.,  1996],  (Han,  1995],  [Jonkman,  1998], 
[Peng  et  al.,  1999],  [Tiberius  and  de  Jonge,  1995],  [Tiberius  et  al, 
1997]. 

The  main  steps  as  implemented  in  the  LAMBDA  method  are 
as  follows.  One  starts  by  defining  the  ambiguity  search  space 

Qa  =  {a€Zn  |  (d  -  a)TQ7 1  (d  -  a )  <  /2}  (24) 

with  x2  a  to  be  chosen  positive  constant.  The  boundary  of  this 
search  space  is  ellipsoidal.  It  is  centred  at  d,  its  shape  is  gov¬ 
erned  by  the  vc-matrix  (7<5  and  its  size  is  determined  by  y}.  In 
case  of  GNSS,  the  search  space  is  usually  extremely  elongated, 
due  to  the  high  correlations  between  the  ambiguities.  Since  this 
extreme  elongation  usually  hinders  the  computational  efficiency  of 
the  search,  the  search  space  is  first  transformed  to  a  more  spherical 
shape, 

az  =  {z£Zn  I  (t-z)T Q-\i-z)  <  x2}  (25) 

using  the  admissible  ambiguity  transformations  z  =  ZT a,  Qi  = 
ZT QzZ.  Ambiguity  transformations  Z  are  said  to  be  admissible 
when  both  Z  and  its  inverse  Z  1  have  integer  entries.  Such  matri¬ 
ces  preserve  the  integer  nature  of  the  ambiguities.  In  order  for  the 
transformed  search  space  to  become  more  spherical,  the  volume¬ 
preserving  Z-transformation  is  constructed  as  a  transformation  that 
decorrelates  the  ambiguities  as  much  as  possible.  Using  the  tri¬ 
angular  decomposition  of  Qz,  the  left-hand  side  of  the  quadratic 
inequality  in  (25)  is  then  written  as  a  sum-of-squares: 


I 

1=1 


fe|  I-Zi)2 


<  r 


(26) 


On  the  left-hand  side  one  recognizes  the  conditional  least-squares 
estimator  z,|/,  which  follows  when  the  conditioning  takes  place  on 
the  integers  zi,Z2i--,Z;-i-  Using  the  sum-of-squares  structure, 
one  can  finally  set  up  the  n  intervals  which  are  used  for  the  search. 
These  sequential  intervals  are  given  as 

(zi-zi)2  <  o\x2 

fell— Z2)2  <  ^(x2-^^1)  (27) 


In  order  for  the  search  to  be  efficient,  one  not  only  would  like  the 
vc-  matrix  Q-2  to  be  as  close  as  possible  to  a  diagonal  matrix,  but 
also  that  the  search  space  does  not  contain  too  many  integer  grid 
points.  This  requires  the  choice  of  a  small  value  for  x2,  but  one 
that  still  guarantees  that  the  search  space  contains  at  least  one  in¬ 
teger  grid  point.  Since  the  bootstrapped  estimator  is  so  easy  to 
compute  and  at  the  same  time  gives  a  good  approximation  to  the 
ILS  estimator  (see  section  4.4),  the  bootstrapped  solution  is  an  ex¬ 
cellent  candidate  for  setting  the  size  of  the  ambiguity  search  space. 
Following  the  decorrelation  step  z  =  ZT a,  the  LAMBDA-method 
therefore  uses,  as  one  of  its  options,  the  bootstrapped  solution  zb 
for  setting  the  size  of  the  ambiguity  search  space  as 

X2  =  (z-zb)tQi\z-zb)  (28) 

In  this  way  one  can  work  with  a  very  small  search  space  and  still 
guarantee  that  the  sought  for  integer  least-squares  solution  is  con¬ 
tained  in  it. 

4.2.  The  ELS  pull-in  region 

The  pull-in  regions  of  integer  rounding  are  unit  cubes,  while  those 
of  integer  bootstrapping  are  multivariate  versions  of  parallellograms. 
To  determine  the  ILS  pull-in  regions  we  need  to  know  the  set  of 
’float’  solutions  a  G  Rn  that  are  mapped  to  the  same  integer  vec¬ 
tor  z  €  Z".  This  set  is  described  by  all  x  G  R"  that  satify  z  = 
argminKgz"  II  u  Ho,-  The  ILS  pull-in-region  that  belongs  to 
the  integer  vector  z  follows  therefore  as 

Sls,z  =  {x  €  I?  1 1|  x  -z  |||a<  ||  x  -  u  |||s,  V«  G  Zn}  (29) 

It  consists  of  all  those  points  which  are  closer  to  z  than  to  any  other 
integer  point  in  B".  The  metric  used  for  measuring  these  distances 
is  determined  by  the  vc-matrix  Q Based  on  (29),  one  can  give  a 
representation  of  the  ILS  pull-in  regions  that  resembles  the  repre¬ 
sentation  of  the  bootstrapped  pull-in  regions.  This  representation 
reads  as  follows. 

Theorem  5  (ILS  pull-in  regions) 

The  pull-in  regions  of  the  ILS  ambiguity  estimator  e  Zn  are 
given  as 

SLS,z  = 

nCiez"{*etfn  I  \cjQal(x-z)  I  <  2 II  c«  lie,},  (30) 

Vz  G  Zn 


This  shows  that  the  ILS  pull-in  regions  are  constructed  from  inter¬ 
secting  half-spaces.  One  can  also  show  that  at  most  2”  —  1  pairs 
of  such  half  spaces  are  needed  for  constructing  the  puE-in  region. 
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The  ILS  pull-in  regions  are  convex,  symmetric  sets  of  volume  1, 
which  satisfy  the  conditions  of  Definition  1.  The  ILS  estimator  is 
therefore  admissible.  The  ILS  pull-in  regions  are  hexagons  in  the 
two-dimensional  case. 

4.3.  Maximizing  the  success-rate 

Although  various  integer  estimators  exist  which  are  admissible, 
some  may  be  better  than  others.  Having  the  problem  of  GNSS 
ambiguity  resolution  in  mind,  one  is  particularly  interested  in  the 
estimator  which  maximizes  the  probability  of  correct  integer  es¬ 
timation.  This  probability  equals  P(a  =  a),  but  it  will  differ  for 
different  ambiguity  estimators.  The  following  theorem  shows  that 
the  ILS  estimator  maximizes  the  probability  of  correct  integer  es¬ 
timation. 

Theorem  6  (ILS  is  optimal ) 

Let  the  pdf  of  the  ’float’  solution  a  be  given  as 

Pa(x)^^Att(Q^)G(\\x-a\\LQa)  (31) 

where  G  :  R  i-»  [0,~)  is  decreasing  and  Qj  is  positive-definite. 
Then 

P{ols  =  a)>  P(a  =  a)  (32) 

for  any  admissible  estimator  a. 

This  theorem  gives  a  probabilistic  justification  for  using  the  ILS 
estimator.  For  GNSS  ambiguity  resolution  it  shows,  that  one  is 
better  off  using  the  ILS  estimator  than  any  other  admissible  inte¬ 
ger  estimator.  The  family  of  distributions  defined  in  (31),  is  known 
as  the  family  of  elliptically  contoured  distributions.  Several  im¬ 
portant  distributions  belong  to  this  family.  The  multivariate  nor¬ 
mal  distribution  can  be  shown  to  be  a  member  of  this  family  by 
choosing  G(x)  =  (2n)~i  exp-|x,x  €  R  Another  member  is  the 
multivariate  f-distribution. 

As  a  direct  consequence  of  the  above  theorem  we  have  the 
following  corollary. 

Corollary  3  ( The  effect  of  the  weight  matrix ) 

Let  E  be  any  positive-definite  matrix  of  order  n  and  define 

az  =  argmin  ||  a-z  111  (33) 

z£Z" 

Then  az  is  admissible  and 

P(aLs  =  a)>  P(az  =  a)  (34) 


In  order  to  prove  the  corollary,  we  only  need  to  show  that  az  is 
admissible.  Once  this  has  been  established,  the  stated  result  (34) 
follows  from  theorem  6.  The  admissibility  can  be  shown  as  fol¬ 
lows.  The  first  two  conditions  of  Definition  1  are  satisfied,  since 
the  ILS-map  produces  -  apart  from  boundary  ties  -  a  unique  integer 
vector  for  any  ’  float’  solution  a  €  RP.  And  since  az  =  arg  min.pz"  1 1 
a  —  u  —  z  |||  +u  holds  true  for  any  integer  u  £  Z",  also  the  integer 
remove-restore  technique  applies. 

As  the  corollary  shows,  a  proper  choice  of  the  data  weight  ma¬ 
trix  is  also  of  importance  for  ambiguity  resolution.  The  choice  of 
weights  is  optimal  when  the  weight  matrix  equals  the  inverse  of  the 
ambiguity  vc-matrix.  A  too  optimistic  precision  description  or  a 
too  pessimistic  precision  description,  will  both  result  in  a  less  than 
optimal  ambiguity  success  rate.  In  the  case  of  GNSS,  the  obser¬ 
vation  equations  (the  functional  model)  are  sufficiently  known  and 


well  documented.  However,  the  same  can  not  yet  be  said  of  the  vc- 
matrix  of  the  GNSS  data.  In  the  many  GNSS  textbooks  available, 
we  will  usually  find  only  a  few  comments,  if  any,  on  this  vc-matrix. 
Examples  of  studies  that  have  been  reported  in  the  literature  are: 
[Euler  and  Goad ,  1991],  [Gerdan,  1995],  [ Gianniou ,  1996],  and 
[Jin  and  de  Jong,  1996],  who  studied  the  elevation  dependence  of 
the  observation  variances;  [ Jonkman ,  1998]  and  [ Tiberius ,  1998], 
who  considered  time  correlation  and  cross  correlation;  and  [Schaf- 
frin  and  Bock,  1988],  [Bock,  1998]  and  [Teunissen,  1998a],  who 
considered  the  inclusion  of  stochastic  ionospheric  constraints. 

4.4.  Bounding  the  ILS  success-rate 

A  very  useful  application  of  theorem  6  is  that  it  shows  how  one 
can  lower-bound  the  ILS  probability  of  correct  integer  estimation. 
This  is  particularly  useful  since  the  ILS  success  rate  is  usually 
difficult  to  compute.  This  is  due  to  the  rather  complicated  ge¬ 
ometry  of  the  ILS  pull-in  region.  The  bootstrapped  success-rate 
is  a  good  candidate  for  the  ILS  success-rates’  lower-bound.  The 
bootstrapped  success-rate  is  easy  to  compute  and  it  becomes  a 
sharp  lower-bound  when  applied  to  the  decorrelated  ambiguities 
z  —  ZT a.  In  fact,  at  present,  the  bootstrapped  success-rate  is  the 
sharpest  available  lower-bound  of  the  ILS  success-rate. 

Apart  from  having  a  lower-bound,  it  is  also  useful  to  have 
an  upper-bound  available.  For  obtaining  an  upper-bound  one  can 
make  use  of  the  geometric  mean  of  the  ambiguity  conditional  vari¬ 
ances.  This  geometric  mean  is  referred  to  as  the  Ambiguity  Dilu¬ 
tion  of  Precision  (ADOP)  and  it  is  given  as 

ADOP  =  v^detga "  (cycles)  (35) 

Note  that  this  scalar  measure  of  the  ambiguity  precision  is  invariant 
for  the  admissible  volume  preserving  ambiguity  transformations. 
With  the  ADOP  one  can  obtain  an  upper-bound  by  making  use  of 
the  fact  that  the  probability  content  of  the  ILS  pull-in  region  SL^  a 
would  be  maximal  if  its  shape  would  coincide  with  that  of  the  am¬ 
biguity  search  space,  while  its  volume  would  still  be  constrained 
to  1 .  We  have  the  following  bounds  for  the  ILS  success-rate. 

Theorem  7  ( Bounds  on  the  ILS  success-rate) 

The  ILS  success-rate  P(fizs  =  a)  is  bounded  from  below  and  from 
above  as 

P(zb  =  z)<  P{azs  —  a)<P(x  (”'0)  <  ^do?)  (36) 

with  c„  =  (!r(!))2/"/Ti 


5.  A  BAYESIAN  APPROACH 
5.1.  The  Bayes  estimate 

The  Bayesian  approach  to  GNSS  carrier  phase  ambiguity  resolu¬ 
tion  starts  from  a  set  of  assumptions  which  differs  fundamentally 
from  the  one  used  in  the  previous  sections,  see  e.g.[Betti  et  al., 
1993],  [Gwtdlich  and  Koch,  2001].  In  the  Bayesian  approach,  not 
only  the  vector  of  observables,  y,  is  assumed  to  be  random,  but 
the  vector  of  unknown  parameters,  a  and  b,  as  well.  Although  the 
Bayesian  approach  has  not  yet  find  a  wide-spread  use  in  any  of 
the  GNSS  applications,  the  basic  concepts  involved  are  of  interest 
in  their  own  right,  also  in  their  comparison  with  the  nonBayesian 
theory  of  the  previous  sections. 
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Let  us  for  the  moment  take  the  two  type  of  parameter  vectors 
a  and  b  together  in  one  vector  x  =  (aT,bT)T.  If  both  y  and  x  are 
random,  we  have  according  to  Bayes’  theorem 

,  .  .  p(y\x)p(x) 

P(X\y)=  p(y)  ^ 

Thus  the  posterior  density  p(x  \  y )  is  proportional  to  the  product  of 
the  likelihood  function  p(y  \  x)  and  the  prior  density  p{x).  Given 
the  data  vector  y,  that  is,  given  the  observations,  the  posterior  den¬ 
sity  gives  a  complete  description  of  the  probabilistic  properties  of 
x.  The  idea  of  the  Bayesian  approach  is  therefore  to  use  the  poste¬ 
rior  density  for  parameter  estimation. 

In  the  Bayesian  approach  to  ambiguity  resolution  it  is  the  so- 
called  Bayes  estimate  which  is  used  as  the  solution  for  the  ambi¬ 
guities  and  baseline.  This  estimate  is  defined  as  follows. 

Definition  4  (The  Bayes  estimate) 

The  Bayes  estimate  XBayes  of  the  random  parameter  vector  x  is 
defined  as  the  conditional  mean 

XBayes  =  E{x  y}  =  j  xp(x  \  y)dx  (38) 


This  definition  can  be  motivated  as  follows.  In  order  to  find  a 
’good’  estimate  x  of  the  parameter  vector  x,  we  would  like  to  de¬ 
termine  a  function  of  the  data,  say  x  =  x(y),  which  in  a  certain 
sense  is  close  to  x.  Let  L(x,x(y))  be  our  measure  of  discrepancy, 
or  our  measure  of  loss,  between  x  and  x.  It  then  seems  reasonable 
to  take  x  as  the  solution  which  minimizes  this  discrepancy  on  the 
average.  This  amounts  to  solving  the  minimization  problem 

nfinE {L(x,x(y))  j  y}  =  mm  j  L{x,x)p(x  \  y)dx  (39) 

This  minimization  problem  is  particularly  easy  to  solve  in  case  the 
loss  function  equals  the  quadratic  form,  L(x,x)  =||  x-x  ||g,  with 
matrix  Q  being  positive  definite.  From  the  decomposition 

E{L(x,x(y))\y}  = 

f\\x-x\\lp(x\y)dx  = 

J  II  *  — E{x  |  y}  |||  p(x  |  y)dx+  ||  E{x  |  y}  -x  ||| 

it  directly  follows  that  the  posterior  expected  loss  is  minimized 
when  x  is  taken  equal  to  the  Bayes  estimate.  When  the  Bayes  esti¬ 
mate  is  substituted  into  the  loss  function,  the  expected  loss  equals 
E  {L(x,xBayes)}  =  trac  e(Qx\yQ~l). 


5.2.  The  marginal  posterior  pdf’s 


In  order  to  apply  (38)  to  our  ambiguity  resolution  problem,  we  first 
need  an  expression  for  the  posterior  density  p(x  \  y)  =  p{a,b  \  y). 
In  the  Bayesian  approach  to  GNSS  ambiguity  resolution,  a  and  b 
are  assumed  to  be  independent,  with  the  following  improper  priors 

f  P{a)  “  I;eZ"S(a-z)  (pulsetrain) 

I  P(b )  "  constant  ^  ’ 


where  5  denotes  the  Dirac  function.  From  the  orthogonal  decom¬ 
position  (22),  the  likelihood  function  can  be  seen  to  be  propor¬ 
tional  to  p(y  |  a, b)  "  exp-|{||  a-a  |||.  +  ||  b(a)-b  ||^  }.  The 
posterior  density  follows  therefore  as 


Pia;b  |y)  "  exp— j{||  d— a 
+  \\ka)-b\\2Q.Jlz£Z»5(a-z) 


The  required  marginal  posterior  densities,  p(a  \  y)  and  pib  \  y),  fol¬ 
low  from  integrating  the  joint  posterior  density  over  the  domains 
of  respectively  a  and  b.  Note  that  in  the  present  case,  the  domain 
of  a  is  taken  as  Rn  and  not  as  2".  In  the  Bayesian  approach,  the 
discrete-like  nature  of  a  is  thought  to  be  captured  by  assuming  the 
prior  to  be  a  pulsetrain.  Once  the  integrations  are  carried  out  and 
the  normalizing  constants  are  restored,  the  marginals  are  obtained 
as  follows. 


Theorem  8  (Marginal  posterior  pdf’s) 

The  posterior  pdf’s  of  the  ambiguities  and  baseline  are  given  as 


f  p(a  |  y)  =  wa(d)Xz€Z„8(a-z) 

{  p{b  |y)  =  Lzez"Pi,|a(&|a  =  z,y)wz(d) 
with  the  weight  function 

exp-i{||d  — z||^} 


wz(a)  —  ,  — — 

Iaez»exp-i{||<5  —  u\\2.} 

and  the  conditional  posterior 

1  1 


,  ZGZ" 


(42) 


(43) 


Pb\a(b  I  a,y)  = 


^J<letQh\.(2K)l2p 


T-exp- 


*-*(«)  llo„.  (44) 


It  is  now  interesting  to  observe  how  the  above  posterior  margi¬ 
nal  pdf  for  the  baseline,  p(b  \  y),  compares  with  the  pdf  of  the 
’fixed’  baseline,  p~h(x),  as  given  in  (10).  Both  pdf’s  are  very  simi¬ 
lar  in  structure.  Both  equal  an  infinite  sum  of  weighted  conditional 
baseline  distributions.  The  two  type  of  conditional  baseline  distri¬ 
butions,  pgia(x  |  z)  and  Ph\a(b  \  z,y),  have  an  identical  shape  but 
differ  in  their  point  of  symmetry.  The  first  is  symmetric  about 
b(z)=:b-QhaQlx(a  -  z),  while  the  second  is  symmetric  about 

b(k  =  b~QiaQs1 

(a  -z).  Also  the  weights  share  some  resemblance.  This  can  be 
seen  if  we  consider  the  probability 

P(a  =  a—z)=Js  (V/detGa(2n)5")_1exp-i  \\x-a\\2Q.dx 

This  probability  can  be  worked  out  to  give 


P(a  =  a-z)  = 


/soexP~l  ll*-zll Qadx 

I«eZ"/s0exP-2  \\x-u\\2q,  dx 


which  shows  the  resemblance  with  (43). 


(45) 


5-3.  The  Bayes  baseline 

With  the  posterior  baseline  distribution  available,  one  can  now 
study  the  corresponding  confidence  regions  as  well  as  determine 
the  Bayes  estimate  of  the  baseline, 

bBayes  ~  Jbp(b\y)db 

For  a  discussion  on  how  to  approximate  the  confidence  regions  of 
the  posterior  baseline,  we  refer  to  [Gundlich  and  Koch,  2001]. 
Using  the  results  of  theorem  8,  the  Bayes  baseline  follows  as 

bBayes  =b~  Q^Qf'  -  X  ^(a)  j  (46) 
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Again  there  is  a  striking  resemblance  with  the  results  of  section 
2.  From  (4)  and  (2.2)  it  follows  that  the  ’fixed’  baseline  can  be 
written  as 

b  =  b-  QfaQ~'  X  &z(°)J  (47) 

We  thus  see  that  the  two  solutions  differ  in  the  way  the  ’float’  solu¬ 
tion  a  is  used  to  weigh  all  integer  grid  points  z  €  Z".  In  case  of  the 
Bayes  baseline  the  smooth  weights  wz(a)  are  used,  while  in  case 
of  the  ’fixed’  baseline,  the  0  -  1  values  of  the  indicator  function 
sz(a)  are  used.  Although  both  baseline  solutions  contain  an  infi¬ 
nite  sum,  the  one  of  the  ’fixed’  baseline  can  be  computed  exactly, 
while  the  one  of  the  Bayes  baseline  can  only  be  approximated. 
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ABSTRACT 


On  the  Role  of  Linear  Precoding  in  Signal  Processing 

for  Wireless 


Georgios  B.  Giannakis 
University  of  Minnesota,  USA 

This  talk  introduces  linear  precoding  (LP)  as  a  useful  signal  processing  tool  for  coping  with 
frequency-selective  propagation  channels  encountered  with  high-rate  wireless  block 
transmissions.  The  importance  of  LP  will  be  presented  for  single-  and  multi-carrier  (OFDM) 
systems,  and  its  links  with  error-control  coding  will  be  delineated. 

Its  features  will  be  described  both  for  point-to-point  and  multiple  access  links  with  emphasis 
placed  on  the  generalized  multicarrier  CDMA,  and  the  novel  ideas  of  block-spreading  and 
chip-interleaving. 
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ABSTRACT 

In  this  paper  we  present  new  algorithms  that  aim  at  estimating  the 
“static”  parameters  of  a  latent  variable  process  in  an  on-line  man¬ 
ner.  This  new  class  of  on-line  algorithms  is  inspired  by  Monte 
Carlo  Markov  chain  (MCMC)  methods  whose  use  has  been  mainly 
restricted  to  static  problems,  i.e.  for  which  the  set  of  observations 
is  fixed.  The  main  interest  of  this  new  class  of  algorithms  is  that  it 
combines  MCMC  and  particle  filtering  techniques,  for  which  ex¬ 
tensive  know-how  and  literature  are  now  available. 

1  Introduction 

1.1  Problem  Statement 

We  consider  here  the  problem  of  on-line  estimation  of  the  param¬ 
eters  for  latent  variable  models,  here  modelled  as  Markov  chains. 
More  precisely  we  define  two  real  vector-valued  stochastic  pro¬ 
cesses  {a:,;  t  €  N}  and  { yt ;  t  €  N*  }  where  the  process  x,  € 
R71'  is  usually  called  the  signal  process  and  the  process  yt  e  R"" 
is  called  the  obsen’ation  process.  The  signal  process  xt  is  here  as¬ 
sumed  to  be  a  Markov  process  with  initial  density  xq  ~  p  (xi;) 
and  transition  probability  densities  from  state  xt-i  to  state  x, . 
Kq  (®f|x(-i).  The  observations  are  conditional  upon  x,  indepen¬ 
dent  and  the  conditional  marginal  density  of  yt  is  g  ( yt  \  xt , ,  6) . 
The  parameter  8  €  Rn<!  is  unknown  and  our  aim  is,  given  the  ob¬ 
servations  yi , . . . ,  yt ,  to  estimate  sequentially  in  time  the  unknown 
parameter  8.  The  problem  is  of  interest  in  many  applications  of 
signal  processing,  and  the  importance  of  this  class  of  problems  has 
generated  a  vast  literature.  We  review  here  some  of  the  proposed 
solutions. 

1.2  Brief  Literature  Review 

Assume  that  an  estimate  of  8  at  time  t — 1 ,  noted  8,,~1\  is  available 
and  that  it  is  possible  to  compute  exactly  the  optimal  filtering  den¬ 
sity  p  (xt  |  iji.t,  j  and  some  of  its  associated  statistics.  Then 
one  can  use  on-line  Gradient/EM  (Expectation-Maximization)  al¬ 
gorithms  to  estimate  9  [3],  [5]  so  as  to  approach  the  maximum  of 
p  ( yut\  6)  as  t  — >■  +00.  In  practice,  at  time  f,  a  statistic  S  9 

of  the  filtering  density  p  yi-.t,  is  evaluated  then  the 

current  value  of  the  parameter  811'1  is  updated  deterministically 
in  order  to  maximize  S  (d(-t~1\8'\  with  respect  to  8,  and  so  on. 


This  has  been  proposed  as  a  solution  to  our  problem  by  many  au¬ 
thors  in  both  the  statistical  and  signal  processing  literatures.  Un¬ 
der  some  regularity  assumptions,  it  can  be  shown  that  these  algo¬ 
rithms  are  able  to  track  the  local  maxima  of  the  series  of  likeli¬ 
hoods  p (.vi :/  |#M  =  1 . Unfortunately,  analytic  expression  of 

the  quantity  S  which  is  an  integral  with  respect  to  the 

filtering  distribution  p  can  only  be  obtained  for 

restricted  classes  of  processes.  For  many  models  that  typically  in¬ 
volve  elements  of  non  Gaussianity  and  nonlinearity  in  the  dynamic 
one  can  in  principle  use  any  numerical  technique  to  approximate  it. 
The  EKF  (Extended  Kalman  Filter)  is  one  of  the  earliest  such  ap¬ 
proximation,  valid  for  continuous  state-spaces  which  are  not  “too" 
nonlinear.  It  should  be  added  that  in  practice,  even  in  favourable 
cases,  these  recursive  parameter  estimation  methods  are  very  sen¬ 
sitive  to  initialization  and  can  easily  get  trapped  in  local  maxima. 

Another  approach  for  parameter  estimation  consists  of  using 
a  full  Bayesian  approach  where  8  is  assumed  to  be  random  with  a 
given  prior  density  p  (9).  Then  it  is  possible  to  define  the  follow¬ 
ing  filtering  density  p  (tr(,  9\  yu),  on  the  “extended  state”  (xt,6). 
However  this  quantity  can  only  rarely  be  evaluated  analytically. 
The  advent  of  powerful  and  cheap  computers  has  permitted  the  de¬ 
velopment  and  the  application  of  an  efficient  and  versatile  class  of 
numerical  methods  that  address  the  filtering  problem.  Sequential 
Monte  Carlo  aka  Particle  Filter  [4].  Using  such  methods,  one  can 
compute  the  filter  on  the  extended  state  (xt,  8),  that  isp  (x/,  8\  yut ) 
and  consequently  perform  inference  on  8.  Unfortunately,  although 
attractive,  this  approach  does  not  work  in  practice.  Indeed  the  ex¬ 
tended  dynamic  model  is  not  ergodic  and  there  is  an  accumulation 
of  errors  over  time,  whatever  the  particle  filtering  method  used  [1], 
This  is  what  motivates  the  following  section,  where  we  discuss  a 
new  class  of  algorithm  to  address  static  parameter  estimation. 

2  Recursive  Monte  Carlo  Algorithms  for  Parameter 
Estimation 

2.1  Principle 

We  first  here  recall  the  principle  of  MCMC  methods  and  illus¬ 
trate  the  Gibbs  sampler  on  our  problem,  when  the  set  of  observa¬ 
tions  does  not  evolve  with  time.  Then  we  show  how  it  is  possi¬ 
ble  to  adapt  this  MCMC  scheme  for  on-line  estimation  purposes. 
In  our  case  MCMC  methods  will  be  designed  to  obtain  samples 
ftri'.j,  8(:)}  (i  =  1, .  • .)  from  say  the  joint  posterior  distributions 
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p{xut,0\  yut).  The  samples  can  be  used  to  evaluate  integrals  for 
example  [6],  In  many  cases  sampling  from  the  joint  distribution 
might  be  too  complicated,  whereas  sampling  from  the  conditional 
distributions  p(zi:(  |  yut,  9)  andp(0|  xu.t,  yut)  might  be  routine. 
The  Gibbs  sampler,  or  data  augmentation  algorithm,  exploits  this 
fact  and  can  be  described  as  follows: 


Data  augmentation 

•  Initialization:  i  =  0  and  6>(0). 

•  Repeat  iteration  i 

~  Xl:t  ~p(si:t|yi:i>0(“'1)) 

-  9m  ~p  (^1  *i*t,  yi:t) 

-  i  -t—  i  +  1 


It  can  be  shown  that  under  mild  conditions  the  homogeneous  Markov 
chain  described  above  is  ergodic  and  produces  -  at  least  asymptot¬ 
ically  -  samples  from  the  joint  distribution  p(xi:t,9\  yut).  More 
precisely,  for  a  properly  defined  norm  on  distributions  it  can  be 
shown  that  lim  ||p  (x[l)t,  0(i))  -  p(xu.t,8\yut)\\  =  0,orper- 

harps  more  interestingly  here  lim  -p(0|t/i:t)||  = 

0.  This  algorithm  is  not  designed  to  achieve  our  goals  of  on-line 
estimation.  However  it  can  be  adapted  to  the  on-line  case  in  the 
simple  following  way: 


Recursive  Data  Augmentation  (RDA) 

•  Initialization:  t  -  0  and  9{0). 

•  Iteration  t 

~  ~P  (*l:*|yi:t,0(<-1)) 

-  6{t)  ~p  {o\x^t,yu.t') 

—  1 1 —  t  -f- 1 


that  is  i  =  t  in  this  case.  It  should  be  stressed  that,  contrary  to 
the  Gibbs  sampler  presented  above,  the  Markov  chain  here  is  non- 
homogeneous,  due  to  the  fact  that  the  set  of  observations  evolves 
with  time.  It  can  however  be  shown  that  at  iteration  t,  the  in¬ 
variant  distribution  of  the  RDA  is  the  joint  posterior  distribution 
p(xut,0\yut).  One  can  therefore  hope  that  as  t  becomes  large, 
and  if p  ( £iu+i ,  6\  yut+\ )  is  not  too  different  from  p  ( xu.t ,  9\  yut) 
(that  is  the  evolution  of  this  distribution  is  smooth  enough  in  a  cer¬ 
tain  sense)  then  P  ,  the  actual  distribution  of  (V/), 

produced  by  the  algorithm  will  track  the  series  of  distributions 
p(xu.t,8\yv.t).  Similarly  one  can  expect  the  chain  0(t)  to  be 
asymptotically  distributed  according  to  p  ( 8\  yut).  ie. 

tJim^  ||p(0(f))  —  p(8\  yi;t)||  =  0.  It  is  known  that  under  regu¬ 
larity  conditions  the  series  of  distributions  p(9\y1:t)  converges  in 
a  certain  sense  to  a  mixture  of  delta  functions  located  on  the  global 
maxima  of  p  ( 0\  yut)-  Under  consistency  conditions  it  is  known 
that  these  maxima  correspond  to  the  true  values  9 »  of  9. 

This  algorithm  shares  many  common  features  with  the  sim¬ 
ulated  annealing  algorithm.  Indeed  both  are  non-homogeneous 


Markov  chains  that  track  series  of  distributions  which  concentrate 
themselves  on  a  set  of  points.  Motivated  by  this  analogy,  it  should 
not  be  surprising  that  convergence  of  the  chain  towards  the  global 
maxima  of  the  marginal  distribution  p(  8  \  yut )  requires  that  this 
series  of  distributions  does  not  concentrate  itself  too  quickly  on  its 
set  of  asymptotic  global  maxima.  It  can  be  shown  in  many  cases 
that  the  rate  at  which  this  concentration  occurs  (in  terms  of  the  rate 
at  which  the  variance  around  global  maxima  goes  to  zero)  is  1  ft, 
which  is  far  more  than  what  is  required  by  the  simulated  annealing 
algorithm.  This  is  why  we  modify  our  algorithm  so  as  to  change 
the  rate  mentioned  above  to  1/log  (t  +  to).  More  precisely  we 
modify  the  two  conditional  distributions  in  order  to  define  a  new 
joint  distribution  and  therefore  define  an  alternative  probabilistic 
model 

p{xut\yut,0)  =  p{xut\yut,9) 

p(6\  xu.t, yut)  oc  [qt(O;xut,yut)]0‘  p(9) 

where  /3 1  is  the  inverse  of  the  “temperature”  and  qt  ( 9\  xu.t,  yut) 
is  recursively  defined  as  follows, 

h{6-,xi,yi)  =  p{xi,yi\6) 

li(0;xu.i,yu.i)  =  [h-\  (0|zi:i_i,pi;i_1)]1-'1'i 

x  \p(xi,yi  \xi-\,9)]li 

for  i  =  2, . . . ,  t  and  a  typically  decreasing  sequence  of  gains 
7 i  6  (0, 1)  (typically  7,  =  1/i).  The  idea  behind  the  defini¬ 
tion  ofp(0|  xut,yut)  is  the  following.  In  view  of  the  definition 
of  a  realistic  sequential  algorithm,  it  is  clear  that  sampling  from 
p{xut\yut,8)  will  be  practically  impossible  as  f  — >  +oo.  There¬ 
fore  a  fixed  number  of  hidden  variables,  say  xt-L-.t >  will  effec¬ 
tively  be  sampled  at  each  iteration.  Consequently  it  is  natural  to 
give  more  weight  to  recent  hidden  variables  and  discard  the  infor¬ 
mation  brought  by  old  ones.  This  is  what  motivates  the  definition 
of  h  (9;xi:i,yui).  Then  intuitively  the  distribution  proportional  to 
It  {9\  xi.t,yut)p{9)  does  not  concentrate  itself  on  its  global  max¬ 
ima  (contrary  top(0|  xut,yu.t)\  and  must  be  “annealed”,  there¬ 
fore  the  power  Bt  — >  +oo. 

Finally,  we  suggest  the  following  improvement  of  the  algo¬ 
rithm.  As  mentioned  above,  it  will  practically  be  impossible  to 
sample  a  full  trajectory  fromp  (xi:t|  yut,  0)  as  the  computational 
complexity  would  increase  over  time.  One  can  suggest  to  sam¬ 
ple  xt-L-.t  only,  according  to  p(xt-L-.t\ yut, 9),  the  past  values 
xut-L-i  are  not  modified  .  The  algorithm  will  proceed  as  fol¬ 
lows. 


(Approximate)  RDA 

•  Initialization:  t  =  0  and  0(o). 

•  Iteration  t 

~  Xt-L:t  ~p(aT-L:«|2/l:t,0(i-1)) 

-  9{t)  ~p(0|a;^,2/i:t) 

—  t  4 —  t  H~  1 

Ifp(xi:t-L-i|  yut, 9)  =  p(xi-.t-L-i\  yut-i,9),  as  in  the  case 
of  mixture  distributions  (see  Section  3),  then  the  algorithm  is  “ex¬ 
act”.  Otherwise  one  implicitely  assumes  that 

p{xut-L-i\yut,0)  ~p(xi:t-i-i|  yut-i,9) , 

it  is  a  valid  assumption  if  the  optimal  filter  has  exponential  forget¬ 
ting  properties. 
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2.2  Extensions  and  approximations 

The  algorithm  that  we  have  presented  above  assumes  that  we  know 
how  to  sample  exactly  from  the  two  conditional  distributions.  Al¬ 
though  this  assumption  is  reasonable  for  a  large  class  of  problems, 
as  we  shall  see  below,  it  does  not  encompass  other  complex  pro¬ 
cesses  of  interest.  This  is  why  we  point  out  here  that  modem  simu¬ 
lation  techniques  can  be  used  in  order  to  apply  the  algorithm  above 
to  more  complex  scenarios. 

Metropolis-Hastings  (MH)/Reversible  jump  MCMC  (RJMC- 
MC):  First  it  should  be  pointed  out  that  perfect  simulation  from  the 
conditional  distributions  might  be  replaced  with  standard  MCMC 
techniques,  that  is  one  replaces  sampling  from  the  conditional  dis¬ 
tributions  with  sampling  from  a  Markov  transition  kernel  which 
admits  ~  as  invariant  distribution.  This  includes  as  a  very  impor¬ 
tant  special  case  RJMCMC,  which  allow  for  on-line  model  selec¬ 
tion. 

Analytical/Particle  filtering  approximation  of  p  ( x,  _  |  y\  :t ,  6): 
It  is  important  to  notice  that  one  of  the  distribution  from  which  we 
want  to  generate  samples  is  the  distribution  p  (  x,-lj  \  y\.t,9).  In 
order  to  make  the  algorithm  practical  it  is  first  necessary  to  replace 

p  withp  (aq-Lul  •  •  •  *  #(,-1)) 

(that  is  we  do  not  restart  the  optimal  filter  from  time  f  =  0  but 
use  the  current  approximation).  This  approach  can  be  thought  to 
be  valid  if  p(xt-L-.t\yut,  9)  evolves  smoothly  as  a  function  of  9. 
When  the  filter  does  not  possess  nice  analytical  properties  (Kalman 
filter,  HMM),  it  is  then  possible  to  use  Sequential  Monte  Carlo 
methods  which  have  proved  to  be  efficient  and  versatile  numerical 
techniques. 


Stochastic  approximation  techniques  can  be  used  to  prove  the  (lo¬ 
cal)  convergence  of  this  algorithm  in  the  finite  mixture  case  [2], 
Although  standard  stochastic  approximation  for  global  optimiza¬ 
tion  do  not  apply  here,  it  can  be  thought  that  the  RDA  will  provide 
consistent  parameter  estimation  under  some  regularity  conditions. 
However  we  would  like  to  point  out  potential  problems  in  a  general 
setting.  Several  levels  of  approximation  have  been  suggested  in 
Section  2.2  in  order  to  make  the  algorithms  practical.  Although  we 
think  that  the  analytical  approximations  proposed  probably  lead  to 
valid  algorithms,  it  seems  to  us  that  the  use  of  particle  filter  tech¬ 
niques  should  lead  to  an  accumulation  of  error,  and  in  the  most 
optimistic  case  lead  to  limited  precision  on  the  estimates  of  the 
true  value  of  6. 

3  Applications 

3.1  Mixture  of  normal  distributions 

We  first  start  with  a  case  which  does  not  require  any  approxima¬ 
tion.  Proofs  of  convergence  are  given  in  [2],  Consider  the  spe¬ 
cial  case  of  finite  Gaussian  mixture  distributions  as  in  [7].  In 
this  scenario,  we  observe  independent  identically  distributed  data 

yi , • •  ■ ,  V<  >  •  •  • 

y,\xt  ~  A'  (//.,, 

and  Pr(.r,  =  j)  =  TXj.  independently  of  jq_i.  We  want  to  esti¬ 
mate  0=  {  (fij ,  <7J ,  Tij );  j  €  S  } . 

To  complete  the  Bayesian  model,  we  assume  that  the  (//j ,  <rj), 
j  e  S,  are  distributed  according  to  the  conjugate  priors 

Pj  I  Gi  ~  M  (vj,<7j/Pj)  •  a)  ~ZG(y'  1") 


2.3  Stochastic  approximation,  convergence  issues  and  open 
problems 

For  the  models  for  which  it  is  practically  possible  to  apply  the  RDA 
algorithm,  the  conditional  distribution  p{9\xut,yv.t)  depends  on 
some  sufficient  statistics  =  j  Yl\=i  'P  (xi-ui,yi),  and  d  *s 
therefore  possible  to  rewrite  our  algorithm  in  the  following  way: 


(Approximate)  RDA 

•  Initialization:  t  =  0,  $(0)  =  0  and  0(o). 


and  n=  (7ri , . . . ,  7TS)  is  distributed  according  to  a  Dirichlet  distri¬ 
bution 

II  ~  Vs  (cvi, . . .  ,as) . 

Note  that  the  prior  does  not  have  any  effect  on  the  final  results.  It 
is  just  used  to  define  a  valid  probability  density  for  the  parameter 
9.  In  practice  it  proves  to  stabilize  the  algorithm  (especially  in  the 
initial  phase).  The  conditional  distribution  of  the  parameters  given 
the  missing  data  is  given  by 

<Zj  ~  2X?  (^-  (fltiij.t  +  Pj),St,jJ 


Iteration  t 


X<i-L:t  ~P  (xt-L-t\ Vl: 


with  <w,, 


„  „,2  ,  „  ,  o  -1-2  _  (pjt’j+PlVi.j)2 

Pivj  +  Vj  Pj+IJ,n,,j 


0(() 


J- 

.  1  2  . 

v-  (  PjVj  +  Wtj 

fh  1 

•  V  Pj+P<n,,j  ; 

1  (pj  +/3tnt,j)J 

n 


where 


—  t  4 —  t  +  1 


Clearly  9  can  be  considered  to  be  a  dummy  variable,  and  xt  is 
drawn  from  the  distribution  f  p{xi\yu<,0)p(Q\  $(f_I>)  d9. 
Then,  the  algorithm  described  above  is  clearly  a  stochastic  ap¬ 
proximation  algorithm.  It  is  interesting  to  notice  that  if  we  set 
pt  =  +oo,  then  p  is  a  mixture  of  delta  function 

on  its  set  of  global  maxima.  This  algorithm  is  therefore  an  on¬ 
line  stochastic  EM  algorithm,  and  the  RDA  algorithm  can  then  be 
thought  of  as  a  “noisy”  version  of  this  algorithm.  This  brings  some 
light  on  the  possible  convergence  properties  of  our  algorithms. 


n\,j  =  SXij,  nt,j  =  (1  -7()nf_i,j  +^tST,j, 

fij  =  £ru'yi,  5(,j  =  (1 

}  ]2.,  =  }  t,j  =  (1  —  +  7  ■ 

Sampling  the  missing  data  is  routine,  as  it  is  a  finite  discrete  dis¬ 
tribution.  For  our  simulation  we  generated  200000  samples  from  a 
mixture  of  three  normals  with  means  —2.0, 0.5  and  1.5,  variances 
3.0,  0.5  and  1.0  and  proportions  0.35,0.6  and  0.05.  We  applied 
our  algorithm  with  fit  oc  at  +  fi.  The  results  are  presented  in  Fig 
1. 
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Figure  1:  The  first  200000  iterations  of  the  algorithm,  with  from 
top  to  bottom,  the  means,  the  variances  and  the  proportions 


3.2  Noisy  AR  process 

In  this  case  the  signal  and  observation  processes  are  described  by 
the  two  following  equations 


****** 


Figure  2:  From  top  to  bottom,  the  AR  coefficient,  the  variance  of 
the  dynamic  noise. 


[2]  Andrieu,  C.,  Doucet,  A.  &  Tadic,  V.  (2001)  Recursive  data 
augmentation  for  parameter  estimation  in  finite  mixture  distri- 
bution  s,  forthcoming. 

[3]  Dempster,  A.R,  Laird,  N.M.  &  Rubin,  D.B.  (1977).  Maximum 
likelihood  from  incomplete  data  via  the  EM  algorithm.  J.  R. 
Statist.  Soc.  B  39,  1-38. 

[4]  Doucet  A.,  de  Freitas  J.F.G.  &  Gordon  N.J.  (eds.)  (2001) 
Sequential  Monte  Carlo  Methods  in  Practice.  New  York: 
Springer- Verlag. 

[5]  Quian,  W.  &  Titterington,  D.M.  (1991).  Estimation  of  param¬ 
eters  in  hidden  Markov  models.  Phil.  Trans.  R.  Soc.  London  A 
337,  407-28. 


•rt+i  —  axt  T  <rvvt+i 

yt  =  xt+owu>t, 

that  is  for  simplicity  we  restrict  ourselves  to  a  noisy  AR(  1 )  process. 
The  unknown  fixed  parameters  are  8  =  (a,  a't .  a2w )  the  AR  coef¬ 
ficient,  the  variance  of  the  dynamic  and  observation  noises  respec¬ 
tively.  We  have  used  standard  normal  and  inverse  gamma  distribu¬ 
tions  for  the  priors  [1].  We  sample  from  p  {^Xt-v.t\ yut,0W,  ■  ■  ■ 

,9  < 4  - 1 )  j  using  the  Kalman  filter  followed  by  a  backward  sampling 
step.  Further  comparison  with  other  approximation  techniques 
will  be  made  in  the  future.  We  show  our  results  in  Fig.  2,  which 
seem  to  lead  to  reasonable  values  of  the  parameters  (the  true  val¬ 
ues  here  were  a  =  .6  and  <rv  =  0.5).  We  however  remind  the 
reader  of  the  potential  validity  problems  of  this  approach  pointed 
in  Section  2.3. 
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Methods.  New  York:  Springer- Verlag. 

[7]  Titterington  D.  M.,  Smith  A.  F.  M.  and  Makov  U.  E.  (1985) 
Statistical  Analysis  of  Finite  Mixture  Distributions ,  London: 
Wiley. 
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ABSTRACT 

We  describe  methods  for  applying  Monte  Carlo  filtering 
and  smoothing  for  estimation  of  unobserved  states  in  a  non¬ 
linear  state  space  model.  By  exploiting  the  statistical  struc¬ 
ture  of  the  model,  we  develop  a  Rao-Blackwellised  Parti¬ 
cle  Smoother.  The  suggested  algorithm  is  tested  with  real 
speech  and  audio  data  and  the  results  are  shown  and  com¬ 
pared  with  those  generated  using  the  generic  particle  smoother 
and  the  extended  Kalman  filter.  It  is  found  that  the  sug¬ 
gested  algorithm  gives  better  results. 

1.  INTRODUCTION 

Many  problems  in  applied  statistics,  statistical  signal  pro¬ 
cessing  and  time  series  analysis  can  be  stated  in  a  state  space 
form  as  follows, 


xt+ 1  ~  /(a;/+i|.r;)  State  evolution  density 

yt+ 1  ~  g{yt+i\xt+\)  Observation  density  ‘ 


where  {ar# }  are  unobserved  states  of  the  system  and  { .(// } 
are  observations  made  over  some  time  interval  t  G  { 1 , . . . ,  T } . 
/(.|.)  and  <?(.[•)  are  pre-specified  state  evolution  and  obser¬ 
vation  densities. 

A  primary  concern  in  many  state-space  inference  prob¬ 
lems  is  sequential  estimation  of  the  filtering  distribution  p(x,  \y\-.t 
and  simulation  of  the  entire  smoothing  distribution  p(xut \yi-.t), 
where  yut  =  {yuy2, . . .  ,yt)  and  x1:t  =  {xi,x2,  ■  ■  ■  ,ar(}. 
Updating  of  the  filtering  distribution  can  be  achieved  in  prin¬ 
ciple  using  the  standard  filtering  recursions 


P(Xt+l\Vl:t) 


J  p{xt\yv.t)f{xl+i\xt)dxt 


P(xt+l\yi:t+l)  = 


g(yl+1\xt+i)p{xl+i\y1:t) 

p{yt+\\yi-t) 


Similarly,  smoothing  can  be  performed  recursively  back¬ 
wards  in  time  using  the  smoothing  formula 


p(xt\yv.T)  =  J  p{xt+i\yi:T) 


p{xt\yi.t)f(xt+i\xt) 
p(xt+i\yi:i) 


dx 


f+i- 


In  practice  these  filtering  and  smoothing  computations  can 
only  be  performed  in  closed  form  for  linear  Gaussian  mod¬ 
els  using  the  Kalman  filter  /  smoother  and  for  finite  state- 
space  hidden  Markov  models. 

For  non-linear  non-Gaussian  models,  there  is  no  general 
analytic  expression  for  the  required  density  functions.  The 
extended  Kalman  filter  is  a  popular  approach  for  non-linear 
models,  which  linearises  the  filtering  distributions,  so  that 
the  Kalman  filter  can  be  applied. 

Another  approximation  strategy  is  that  of  sequential  Monte 
Carlo  methods,  also  known  as  Particle  Filters  [4,  3].  Within 
the  particle  filter  framework,  the  filtering  distribution  is  ap¬ 
proximated  with  an  empirical  distribution  formed  from  point 
masses,  or  particles, 

p{xt\lli:t)  ~  ^w^Sixt-x^),  =  1,  w,(0  >  0 

i=l  »=1 

where  <5(.)  is  the  Dirac  delta  function  and  is  a  weight 
attached  to  particle  x]'\  In  addition  to  the  particle  filter, 
particle  smoothers,  which  are  a  simple  and  efficient  method 
for  generating  realisations  from  the  entire  smoothing  den¬ 
sity  p(x i  :t)  using  the  particulate  approximation  has 
been  developed  [5]. 

)  2.  AUDIO  MODELS 

Speech  signals  are  inherently  time-varying  in  nature,  and 
any  realistic  representation  should  thus  involve  a  model  whose 
parameters  evolve  over  time.  One  such  model  is  the  time- 
varying  autoregression  (TVAR) 

p 

ut  -  +  e> 

i- 1 

Here  {u;}  is  the  audio  signal  process,  at  =  [a<,i, . . . ,  a(]P]' 
is  the  pth  order  AR  coefficient  vector  and  e*  is  a  Gaussian 
excitation  at  time  t  having  variance  a'f., .  A  Gaussian  random 
walk  model  is  assumed  for  the  log- variance  cf>ei  =  log(oft ), 

f(<t>e,\<t>et-nalc)=N{P4>t,alc)  (2) 
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where  p<t>t  =  log(Q<jgt  i)  and  a  is  a  coefficient  just  less 
than  1. 

For  the  time  variation  in  at ,  we  choose  to  work  in  the 
time-varying  partial  correlation  (PARCOR)  coefficient  do¬ 
main  [5,  6],  Improved  stability  can  be  achieved  by  ensuring 
each  reflection  coefficient,  pt,  is  within  the  interval  (-1,+1). 
The  constrained  PARCOR  random  walk  model  is 


0 


if  max{|pM|}  <  1 
otherwise 

(3) 


where  pt  —  [pt.,i,  ■  •  ■ ,Pt,p ]•  The  TV-PARCOR  is  assumed 
because  it  provides  a  better  physical  representation  of  audio 
signals.  This  arises  since  the  TV-PARCOR  model  can  be  re¬ 
garded  as  a  time- varying  acoustical  tube  mechanism,  which 
is  a  reasonable  approximation  for  speech  and  many  musical 
instruments. 

The  full  specification  of  the  state  space  model  is  then  as 
follows.  The  state  vector  xt  is  partitioned  as  [zt,0t]  with 
zt  =  Ut-p+iu  and  8t  being  the  signal  state  and  the  param¬ 
eter  state  respectively.  The  signal  is  assumed  to  be  sub¬ 
merged  in  white  Gaussian  noise  (WGN)  with  known  vari¬ 
ance  al,  i.e. 


In  addition  to  the  Monte  Carlo  filtering  algorithm,  Monte 
Carlo  smoothing  [5]  methods  have  been  developed  to  gener¬ 
ate  realisations  from  the  joint  smoothing  density  p{x\:r \yi-.r) 
Sample  realisations  are  obtained  using  the  following  factori¬ 
sation 

T-l 

P{Xi-.t\Vv.t)  =P(xT\yv.T )  n  P(Xt\xt+l:T,yi-.T)  (5) 

t=  1 

where,  given  the  particulate  approximation  to  p(xt  |y1:t)  and 
using  the  Markovian  assumptions  of  the  model,  we  can  write, 

p{xt\xt+i-.T,  Dv.t)  ocp(xt\y1:t)f(xt+i\xt)  (6) 

i=  1 

with  the  modified  weights 

wt]t+i  oc«,tt)/(*<+i|*t‘))  (7) 

This  revised  particle  distribution  can  now  be  used  to  gener¬ 
ate  states  successively  in  the  reverse-time  direction,  condi¬ 
tioning  upon  future  states. 


9(yt\xt)  =  Af(yt;ut,a2v) 

The  parameter  vector  6t  is  further  partitioned  as  [of ,  4>ei}. 

3.  MONTE  CARLO  FILTERING  AND  SMOOTHING 

Rearrange  the  filtering  distribution  p(xt \yut), 
p(xt\yi:t)  (X  g(yt\xt)p(xt\yut-i)  (4) 

9(yt\xt)f(xt\xt-l)p(x0:t-l\yi:t-l)dx0:t-l 

Provided  that  a  particle  approximation  to  p{xo-.t-i\y\-.t-i) 
has  already  been  generated, 

N 

P{XO:t-l\yi:t-l)  ~  ^<5(x0:t_l  -x£Ll) 

i— 1 

Then,  assuming  we  can  evaluate  f(xt\xt-\)  and  g{yt\xt) 
pointwise,  we  generate,  for  each  state  trajectory  x<'0l)t_1,  a 

random  sample  from  a  proposal  distribution,  q(xt  l^o-t-i  >  Z/i :t)- 
The  filtering  distribution  (4)  can  then  be  approximated  as 


4.  RAO-BLACKWELLISED  PARTICLE  FILTERING 
AND  SMOOTHING 

One  of  the  major  drawbacks  of  any  Monte  Carlo  filtering 
/  smoothing  strategy  is  that  sampling  in  high-dimensional 
spaces  can  be  inefficient.  In  some  cases,  however,  the  model 
has  “tractable  substructure”  [2],  which  can  be  analytically 
marginalised  out,  conditional  on  other  state  variables.  The 
advantage  of  this  strategy  is  that  it  can  drastically  reduce  the 
size  of  the  space  over  which  we  need  to  sample,  and  hence 
the  estimation  variance  [4]. 

Marginalising  out  some  of  the  variables  is  an  example 
of  a  standard  statistical  variance  reduction  strategy  known 
as  Rao-Blackwellisation,  see  [1]  for  a  general  discussion  on 
the  topic. 

In  this  paper  we  focus  on  applying  Rao-Blackwellisation 
to  fixed-interval  smoothing;  that  is,  given  y\:T,  we  would 
like  to  simulate  from  the  entire  state  density  p(£i:t|?/i:t). 
The  reason  for  this  is  that  a  greater  degree  of  smoothing  can 
be  important  for  the  convincing  reconstruction  of  audio  sig¬ 
nals. 

4.1.  Rao-Blackwellised  Particle  Filter 


with 


P(xt\yut)  tt^w^dixt  -  x{t]) 


(*)  „  gCyrfj/Qg^lsI-i) 

Wt  ^  /  (i)  |  (i)  \ 

9(x't  ko:t-l>3/l:<) 


First  we  review  the  standard  RB  particle  filter  [2,  4],  As¬ 
sume  that  the  state  vector  x\-.t  can  be  partitioned  as  [zi-t,di:t] 
and  z\:t  can  be  marginalised  out  analytically.  For  instance, 
if  conditional  on  9\:t,  Z\:t  reduces  to  a  linear  Gaussian  state- 
space  system,  then  all  the  integration  can  be  performed  an¬ 
alytically  on-line  using  the  Kalman  filter  and  the  prediction 
error  decomposition. 
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Let  us  consider  the  marginal  filtering  distribution. 

p(0)\yi:t)  =  J  P(Zi,Ot\yi:t)dz, 

OC  J  p(yAOl:t,yi:t-l)f(Ot\0t-l)p(0O:t-l  |j/l:t-l)^0: 

Given  the  particle  approximation  to  p(8o-.t-i  Ivi-.i-i)-  new 
particles  0\']  are  drawn  from  / (0t |0j^j ),  and  p(dt\yi:f)  is 
approximated  by 

N 

p(et\yi,)^J2w<)6^-9'^ 

i=  1 

with 

w(tl)  <xp(yi\e['^yv.t-i) 

Under  the  assumption  of  a  conditionally  linear  Gaussian 
structure,  p(yt  |01;, ,  y1;t- 1 )  can  be  evaluated  using  the  Kalman 
filter  and  the  prediction  error  decomposition. 

4.2.  Rao-Blackwellised  Particle  Smoother 

We  modify  the  generic  particle  smoother  [5]  to  incorporate 
Rao-Blackwellisation  to  form  a  RB  Particle  Smoother. 

Given  the  particulate  approximation  for  the  parameter 
filtering  distribution p(8i  the  marginal  smoothing  dis¬ 
tribution  p(di\zl+i-:r,Ot  +  l:'j\yi-.r)  is  approximated  by 

N 

p(0t  \Zt+l:T,0t  +  l:T,yi:T)  ~  E  ^^6(8,  ~  0\') )  (8) 

i- 1 

with  the  modified  weight  w[| ’+1  being 

Wty+ 1  =  w^)p{zt+i,et+\\8{ut,yi:t)  (9) 

Smoothed  realisations  for  the  parameters  {6,-J  =  1, . . . ,  T} 
can  be  generated  recursively  backward  in  time  using  the 
equations  above  in  a  similar  manner  to  the  generic  particle 
smoother  [5],  We  now  prove  the  correctness  of  this  Rao- 
Blackwellised  approximation. 

Proof:  By  partitioning  the  state  vector  as  x\  :t  =  [z\  :r,8\ 
we  factorise  the  smoothing  density  function  p{x\-T\y\-.T), 

p{.Zi:T,Oi,T\yi-.T)  =  P{zT,8r\yi:T)x 

T—l 

X  IJ  P(Zti8t\zt+l:T,8t+l:T,V\:T)  00) 

t-=  1 

The  conditional  smoothing  density  p(zt ,  0, \zt+\-r,  0,A  [  :r-,  Vi  -.r 
can  be  further  factorised, 

p(zt,9t\zt+i:T,0t  +  l:T,yi:T)  =  P(zt \0t:T,  Zt+1:T,  Vv.t)  X 

X  p(6t\Zt  +  l:T,8i  +  l:T,yi:T)  (ID 


Using  the  particle  approximation  from  the  forward  sweep 
of  the  RB  particle  filter 

N 

1  1=1 

and  the  Markovian  assumptions  of  the  model,  we  marginalise 
the  current  signal  state  zt  out  from  the  joint  density  function 
(11)  yielding  the  following  approximation, 

P{0t\zt+1:T,  8t+\:T,  V 1  ‘.I') 

(X  p(zt  + 1 , 6>,  +  1  \0t ,  1/1  :f  )p(0, 1.1/1 :() 

r  N 

J  1=1 

i-\ 

with  the  weight  being  wj|'+1  =  it'\']p{  zt+],0i+i\0[9,yi-.t), 

as  required. 

Given  0I:T  and  ,  0t  is  drawn  using  the  approxi¬ 

mate  smoothing  distribution  (8)  and  the  modified  weight 
(9).  Provided  that  0t  =  6\J\  the  smoothed  signal  realisa¬ 
tion  zt  is  obtained  by  sampling  from  the  conditional  density 
function  p(z, \d\j) ,  zt+\:T,  Ot+i-.T,  Vi-.t),  which  is  Gaussian. 

Under  the  assumption  of  a  conditionally  Gaussian  struc¬ 
ture  for  the  signal,  the  modified  weight  can  be  computed  ef¬ 
ficiently  using  the  one-step  ahead  prediction  equation  from 
the  Kalman  filter. 

By  repeating  the  sampling  process 

^~EwS+i^-^(i)) 

7=1 

Zt  ~  p(zt\8\j)  ,Zt  +  l,T,Ot  +  UT,V\-.T) 

recursively  backward  in  time,  approximate  samples  [zut,  ®\:t) 
are  drawn  fromp(2i:x, 

5.  EXPERIMENTAL  RESULTS 

Extensive  tests  have  been  carried  out  to  investigate  the  ef¬ 
fectiveness  of  the  suggested  algorithms  using  a  variety  of 
audio  datasets.  It  is  found  that  our  proposed  Rao-Blackwellised 
(TV-PARCOR)  particle  smoother  consistently  outperforms 
the  classical  extended  Kalman  smoother  and  the  generic 
)  particle  smoother.  Some  representative  examples  of  the  tests 
conducted  are  described  in  detail. 

In  our  simulations,  the  fixed  hyperparameters  used  are 
cr%  =  lCT4,  q  =  0.995  and  tr^  =  10~6.  Finally,  the 
TVAR  model  order  is  fixed  to  p  =  6. 
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5.1.  Test  1  —  Rao-Blackwellised  Particle  Smoother 

A  section  of  speech  data  is  used  to  compare  the  perfor¬ 
mance  of  the  extended  Kalman  smoother,  the  generic  par¬ 
ticle  smoother  and  the  RB  particle  smoother. 

For  the  generic  particle  smoother  and  the  RB  Particle 
Smoother,  the  TV-PARCOR  random  walk  model  is  used. 
N  =  200  particles  are  used  and  smoothing  is  applied  to  gen¬ 
erate  M  =  20  realisations.The  SNR  in  is  10.2  dB  and  the 
SNR  out  of  different  algorithms  are  summarised  below: 
Extended  Kalman  Smoother  12. 1  dB 
Generic  Particle  Smoother  10.8  dB 
RB  Particle  Smoother  13.3  dB 

As  a  result  of  this  and  other  simulations  on  audio  data, 
we  conclude  that  the  RB  particle  smoother  outperforms  the 
generic  particle  smoother  and  the  Extended  Kalman  smoother. 
It  confirms  experimentally  the  theory  that  by  marginalising 
out  some  of  the  state  variables,  the  estimation  performance 
will  improve. 

5.2.  Test  2  —  TV-PARCOR  model 

As  the  extended  Kalman  filter  is  a  computationally  cheap 
algorithm,  the  RB  particle  smoother  has  to  show  consistent 
improvement  over  the  extended  Kalman  smoother.Therefore 
we  re-run  the  test  using  two  pieces  of  high  quality  music. 
Meanwhile,  we  include  the  RB  (TVAR)  particle  smoother 
in  the  test  in  order  to  verify  experimentally  our  suggestion 
that  the  TV-PARCOR  model  is  a  better  physical  representa¬ 
tion  of  audio  signals  than  the  TVAR  model. 

The  two  pieces  of  music  used  is  a  section  of  violin  play¬ 
ing  (SNR  in  =  5.8  dB)  and  brass  (SNR  in  =  10.4  dB). 

In  each  case,  N  =  100  particles  are  used  and  M  =  10 
smoothed  trajectories  are  generated.  The  output  SNR  of  dif¬ 
ferent  algorithms  are  summarised  below: 

violin  brass 

Extended  Kalman  Smoother  6.9  dB  6.3  dB 

RB  TVAR  Smoother  10.0  dB  16.8  dB 

RB  TV-PARCOR  Smoother  12.1  dB  17.0  dB 

We  conclude  that  the  RB  particle  smoother  outperforms 
the  extended  Kalman  filter  very  dramatically  in  terms  of 
SNR  for  these  extracts,  giving  a  significant  noise  reduc¬ 
tion  when  the  extended  Kalman  filter  effectively  fails.  In 
addition,  the  TV-PARCOR  model  outperforms  the  standard 
TVAR  model,  with  the  amount  of  improvement  depending 
on  the  type  of  input  material. 

5.3.  Test  3  —  Different  input  SNR 

In  our  final  test,  we  investigate  the  performance  of  the  RB 
particle  smoother  algorithm  at  different  input  SNR  levels 
and  compare  with  those  generated  using  the  extended  Kalman 
smoother.  A  section  of  piano  music  is  used  for  this  purpose. 


The  SNR  out  using  the  different  algorithms  given  noisy 
signals  at  different  input  SNR  levels  are  summarised  below: 
SNR  in  RB  smoother  ex  Kalman  smoother 
0  dB  8.5  dB  4.2  dB 

10  dB  13.9  dB  13.8  dB 

20  dB  20.9  dB  20.9  dB 

It  is  found  that  the  Rao-blackwellised  particle  smoother 
performs  significantly  better  than  the  extended  Kalman  filter 
at  low  SNR,  while  both  algorithms  perform  equally  well  at 
high  SNR. 

6.  CONCLUSIONS 

In  this  article,  we  have  applied  sequential  Monte  Carlo  smooth¬ 
ing  methods  to  audio  signal  enhancement  problems.  A  TV- 
PARCOR  model  was  proposed  for  modelling  the  time  varia¬ 
tion  of  the  AR  coefficients,  which  has  a  good  physical  inter¬ 
pretation  in  terms  of  acoustical  tube  models.  In  cases  where 
the  models  concerned  have  some  “tractable  substructure”, 
Rao-Blackwellisation  can  be  applied  to  reduce  the  estima¬ 
tion  variance.  A  RB  smoothing  algorithm  was  developed 
for  this  purpose.  Extensive  tests  have  been  carried  out  to 
investigate  the  effectiveness  of  the  suggested  algorithms  ap¬ 
plied  to  a  variety  of  audio  data.  It  is  found  that  the  RB 
(TV-PARCOR)  particle  smoother  outperforms  classical  ap¬ 
proaches  such  as  the  extended  Kalman  smoother  and  the 
generic  particle  smoother. 
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ABSTRACT 

This  paper  presents  a  parameter  estimation  method  for 
the  Candy  model  based  on  Monte  Carlo  approximation  of 
the  likelihood  function.  In  order  to  produce  such  an  ap¬ 
proximation  a  Metropolis-Hastings  style  algorithm  [3]  for 
simulating  the  Candy  model  [10,  1 1]  is  introduced. 


1.  SET-UP  AND  NOTATION 


In  the  last  decade  in  image  processing,  a  few  researchers 
moved  away  from  pixel-based  methods  to  more  high-level 
image  analysis  based  on  point  process  models.  In  this  spirit, 
Stoica,  Descombes  and  Zerubia  [11]  introduced  a  marked 
point  process  model  for  line  segments,  dubbed  Candy,  as 
prior  distribution  for  the  image  analysis  problem  of  extract¬ 
ing  Unear  networks  such  as  roads  or  rivers  from  images  ob¬ 
tained  by  aerial  and  high  resolution  satellite  photography. 

More  formally,  represent  a  line  segment  as  a  point  in 
some  compact  subset  I<  c  M2  of  strictly  positive  volume 
0  <  v{K )  <  oo  with  an  attached  mark  taking  values  in 
[/min, /max]  X  [0, 7t)  for  Some  0  <  /min  <  /max  <  OO. 
Each  marked  point  (k,  /,  9)  can  be  interpreted  as  a  line  seg¬ 
ment  with  midpoint  k,  length  /,  and  orientation  9.  When 
applying  the  model  to  road  extraction,  it  is  natural  to  in¬ 
clude  marks  for  characteristics  such  as  width  and  color  as 
well.  A  configuration  of  line  segments  is  a  finite  set  of 
marked  points.  The  probabilistic  model  is  defined  by  its 
density  p  with  respect  to  a  unit  rate  Poisson  process  on 
K  with  independently  and  uniformly  distributed  marks  as 
follows.  At  s  =  {sj, . . .  ,«„}  with  s;  =  {kj,li,0i)  € 
A  X  [/min ,  /max]  X  [0,  7r),  Z  1,  .  .  . ,  71, 


Pi b)  =  «  ECU  exP 


vl/(s 

7/ 


7  o 


,(s)  nr(  s) 
lr 


(i) 


where  7/,7s,7rf  >  0and7o,7,.  £  (0, 1),  are  the  model 
parameters.  Stoica  et  al.  recommend  7/  <  <  7 ,/,  in  or¬ 

der  to  favor  configurations  containing  more  connected  seg¬ 
ments  than  free  ones.  The  sufficient  statistics  ?t/(s),  ns(s), 
7irf(s),  n0( s),  nr( s)  respectively  represent  the  number  of 
'free'  segments,  the  number  of  segments  with  a  single  one 
of  its  endpoints  near  another  segment  endpoint,  the  number 
of  segments  with  both  extremities  connected,  the  number 
of  pairs  of  segments  crossing  at  too  sharp  angles,  and  the 
number  of  pairs  that  are  disoriented.  Thus,  there  are  penal¬ 
ties  attached  to  each  free  and  singly  connected  segment,  as 
well  as  to  each  sharp  crossing  and  to  every  disagreement  in 
orientation.  For  more  details  on  the  model  and  its  applica¬ 
tions  to  network  extraction  see  [11],  and  [9]  where  the  au¬ 
thors  prove  existence  and  Ruelle  stability  of  p  and  establish 
various  Markov  properties. 


2.  METROPOLIS-HASTINGS  ALGORITHMS 

The  Candy  model  (1)  is  too  complicated  to  sample  from 
directly.  Hence,  we  apply  Markov  chain  Monte  Carlo  tech¬ 
niques  [6]  to  construct  a  Markov  chain  which  has  the  Candy 
model  7r  as  its  equilibrium  distribution.  Here  we  use  the 
Metropolis-Hastings  sampler,  a  flexible  proposal-acceptance 
technique  that  is  well  adapted  to  point  processes  [3,  7],  In 
its  generic  form,  the  transition  proposals  are  uniformly  dis¬ 
tributed  births  and  deaths.  The  acceptance  probabilities  are 
based  on  the  likelihood  ratio  of  the  new  state  compared  to 
the  old  one.  Due  to  the  results  in  [2],  the  algorithm  con¬ 
verges  in  total  variation  to  tt  for  7r-almost  all  initial  configu¬ 
rations  provided.  The  theorem  applies  equally  to  any  other 
pair  of  strictly  positive  birth  and  death  kernels. 

In  order  to  improve  mixing,  we  incorporate  transitions 
that  are  tailor-made  for  the  Candy  model.  Thus,  we  include 
a  birth  kernel  that  tends  to  add  a  segment  in  order  to  pro¬ 
longate  the  current  network.  The  idea  is  that  when  adding  a 
segment,  preference  should  be  given  to  positions  that  ‘fit* 
the  current  configuration.  More  specifically,  a  new  seg- 
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ment  might  be  positioned  in  such  a  way  that  it  is  connected 
to  an  endpoint  of  a  segment  in  the  configuration,  see  [9]. 
For  computational  convenience,  we  only  connect  to  seg¬ 
ment  endpoints  that  are  sufficiently  far  from  the  boundary 
of  A'. 

Another  option  is  to  include  transition  types  other  than 
births  and  deaths.  For  instance  in  [2]  change  transitions  that 
do  not  alter  the  number  of  segments  are  described.There  are 
many  valid  choices  for  the  proposal  kernel.  For  instance, 
we  may  shift  a  segment  center  a  bit,  modify  the  orientation 
and/or  the  length,  or  even  discard  a  segment  altogether  and 
generate  a  new  one  randomly.  For  more  details  see  [9], 


Model  parameters 
7/  =  0.0002 
7S  =  0.05 

7  d  =  12-2 
7o  =  0.08 

7V  =  0.08 


Sufficient  statistics 

71/  =  4 

ns  =  34 

nd  -  63 

n0  =  12 

nr  —  9 


Fig.  1.  Realization  (top)  of  the  Candy  model  given  by  the 
parameters  in  the  middle  table.  The  observed  values  of  the 
sufficient  statistics  are  listed  below. 

In  Figure  1  we  present  a  sample  of  the  Candy  model,  its 
parameters  and  the  observed  values  of  the  sufficient  statis¬ 
tics.  We  carried  out  2  x  107  iterations.  The  sufficient  statis¬ 
tics  were  taken  every  103  iterations.  The  point  space  is 
K  =  [0, 256]  x  [0, 256]  while  marks  take  values  in  [30, 40]  x 
[0,7r).The  weights  of  the  different  transition  kernels  were 
fixed  empirically.  The  Candy  model  is  very  complex,  hence 


it  is  difficult  to  assess  convergence.  However,  we  may  ana¬ 
lyze  the  evolution  of  the  cumulative  means  of  the  sufficient 
statistics  during  the  simulation.  These  are  plotted  in  Fig¬ 
ure  2. 


n/  =  3.64 


ns  =  28.63 


77rf  =  64.07 


nT  =  7.82 

Fig.  2.  Evolution  of  the  empirical  moments  of  the  sufficient 
statistics  during  the  simulation  of  the  Candy  model.  The 
cumulative  means  fif,ns,nd,n0,nr  (from  top  to  bottom) 
are  plotted  as  a  function  of  the  number  of  iterations. 

3.  MAXIMUM  LIKELIHOOD  ESTIMATION 

The  Candy  model  (1)  is  a  five-parameter  exponential  family 
pw(s)  =  a(w)exp  [f(s)Tw]  h( s) 
with  canonical  sufficient  statistic 

t(  S)  =  (7l/(s),77s(s),77d(s),  770(s),nr(s))T, 
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parameter  vector 

uj  =  (log  7/ ,  log  7S ,  log  7,; ,  log  70 ,  log  7r ) T , 


and /t(s)  =  117=1  exp 

sampling  ideas  outlined  in  [4,  5 
constants  can  be  expressed  as 


.  Using  the  importance 
the  ratio  of  normalizing 


a{u)0) / a(u)  =  Euo  exp  [f(S)T(w  -  u70)] 

and  the  log  likelihood  ratio  with  respect  to  some  reference 
value  wo  can  be  written  as 

,(")  =  logS7S)  r 

=  f(s)T(w  -  Wo)  -  log exp  [t(S)1  (w  -  w0)J  . 

(2) 

The  score  equations  V7(w)  =  t(s) — E^.T (S)  and  Fisher 
information  matrix  7(w)  =  —  V2/(w)  =  Var^.f(S)  are  eas¬ 
ily  derived,  so  that  under  the  maximum  likelihood  estima¬ 
tor  w,  the  expected  values  of  the  sufficient  statistics  must 
be  equal  to  the  observed  values.  Now,  since  the  covari¬ 
ance  matrix  of  t(S)  is  positive  definite,  (2)  is  concave  in 
w.  Therefore,  provided  the  score  equations  have  a  solution 
w  in  M  x  El ,  a  unique  maximum  likelihood  estimator  exists 
and  equals  w.  Otherwise,  a  maximum  may  be  found  on  the 
boundary  of  the  parameter  space. 

Numerically,  the  expectation  in  (2)  can  be  approximated 
[4,  5]  by  its  Monte  Carlo  counterpart 

i  ^  exP  [t(Si)T(UJ  -  WO)] 

7=1 

based  on  a  single  sample  Si,...,  S„  from  p^„ . 

Considering  the  true  unknown  MLE  w,  due  to  [2,  The¬ 
orem  7]  the  Monte  Carlo  maximum  likelihood  estimator  is 
consistent  and  satisfies  the  central  limit  theorem  : 

\f{n)  (wn  -w)  -» 

where  E  is  the  asymptotic  covariance  matrix  of  the  normal¬ 
ized  Monte  Carlo  score  \/{n)Vln  (w)  and  /(w)  denotes  the 
Fisher  information  matrix  at  the  maximum  likelihood  esti¬ 
mator. 

However,  the  method  described  above  relies  on  a  refer¬ 
ence  value  wo  that  is  not  too  far  from  the  maximum  likeli¬ 
hood  estimator.  Here  we  used  the  iterative  gradient  method  [  1  ]. 

{/„(wA.  +  p(wA.)VZ„(wA.))  = 

=  maxp6K  ln(wA-  +  pV/„(wA.))  (3) 

wA--fi  =  wA.  +  p(wA,)Vl„(wA.) 

to  find  a  reasonable  value.  Here  p(wA  )  is  the  optimal  step, 
which  is  computed  using  a  one-dimensional  minimization 
of  the  likelihood  function. 


We  implemented  the  procedure  for  the  data  of  Figure  1 . 
Starting  with  sone  arbitrary  initial  values  (see  Figure  3,  first 
column)  we  ran  (3)  for  1000  steps  to  obtain  the  vector  w0 
listed  in  the  second  column  of  Figure  3.  Based  on  a  sample 
of  size  7?  =  2  x  107  from  p^,0 ,  we  calculated  the  Monte  Carlo 
approximation  /„(w),  cross  sections  of  which  are  shown  in 
Figure  5.  The  maximum  of  /71(w)  is  located  at  w"  as  listed 
in  Figure  3  (third  column). 

In  Figure  4  we  show  the  asymptotic  standard  deviation 
of  the  true  MLE.  and  the  Monte  Carlo  Standard  Error  (MCSE) 
which  approximates  the  difference  between  the  unknown 
MLE  and  its  Monte  Carlo  approximation.  We  notice  that 
by  increasing  n,  we  can  make  the  MCSE  negligible. 


Initial  parame¬ 
ters 

Iterative 

method 

Monte  Carlo 

MLE 

lj's  =  -9.5 
w'  =  -4.0 

07'  =1.5 
<  =  -3.5 
w'  =  -3.5 

=  -8.37 

=  -2.74 
=  2.46 

Cj°0  =  -2.13 
u7°  =  -2.42 

Cj1}  =  -8.32 
u”  =  -2.73 
u’J  =  2.47 
<  -  -2-17 
cu"  =  -2.42 

Fig.  3.  Estimation  of  the  parameters  for  the  data  of  Figure  1 . 


Asymptotic  standard 
deviation  of  MLE 

MCSE 

0.51 

0.002 

0.23 

0.003 

0.17 

0.001 

0.30 

0.002 

0.31 

0.005 

Fig.  4.  Estimation  errors. 


4.  CONCLUSION  AND  FUTURE  WORK 

In  practice,  the  main  challenges  in  working  with  point  pro¬ 
cesses  are  the  following:  to  build  appropriate  moves,  to  find 
the  optimal  way  of  combining  them  into  a  simulation  algo¬ 
rithm,  and  to  carry  out  statistical  inference.  Here,  we  have 
built  a  Metropolis-Hastings  sampler,  that  combines  uniform 
birth  and  death  proposals  that  guarantee  the  convergence  of 
the  Markov  chain  to  the  target  equilibrium  distribution  (1) 
with  transitions  designed  to  exploit  specific  characteristics 
of  the  model,  in  our  case  connectivity  properties. 

The  main  application  of  the  Candy  model  is  that  of  thin 
network  extraction.  This  was  the  topic  of  [10,  11],  where 
results  were  obtained  using  fixed  pareameters  as  well  as  ap¬ 
proximations  to  the  Metropolis-Hastings  proposal  kernels 
and  acceptance  probabilities.  The  results  here,  and  in  [9], 
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remove  the  need  for  approximate  sampling,  and  may  be  a 
starting  point  for  unsupervised  network  extraction. 
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Fig.  5.  Monte  Carlo  approximation  of  the  log  likelihood 
function.  The  X  axis  represents  the  variation  of  a  single 
component.  The  Y  axis  represents  the  values  of  the  Monte 
Carlo  log  likelihood  with  all  other  components  of  u>°  fixed. 
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ABSTRACT 


2.  DATA  MODEL 


The  source  detection  problem  in  array  processing  can  be  consid¬ 
ered  a  test  for  equality  of  eigenvalues.  This  approach  is  imple¬ 
mented  through  a  multiple  hypothesis  procedure  which  compares 
all  pairwise  differences  between  eigenvalues.  A  resampling  proce¬ 
dure  is  used  to  estimate  the  null  distributions  of  the  test  statistics, 
an  advantage  for  small  sample  sizes  or  non-Gaussian  signals  since 
traditional  techniques  such  as  the  MDL  assume  Gaussianity.  Sim¬ 
ulations  show  the  increased  performance  of  the  test  compared  to 
the  MDL  for  small  samples  or  non-Gaussian  signals,  with  a  no¬ 
ticeable  improvement  over  the  more  accurate  sphericity  test. 

Keywords  :  array  source  detection,  resampling,  bootstrap,  multi¬ 
ple  hypothesis  tests,  model  selection 


1.  INTRODUCTION 

Source  detection  is  an  important  first  step  in  array  processing. 
Model  order  selection  procedures  based  on  information  theoretic 
criteria  such  as  Rissanen's  minimum  description  length  (MDL)  [1] 
are  well  known.  Alternatively,  the  problem  can  be  cast  as  a  hy¬ 
pothesis  test  for  equality  of  the  smallest  sample  eigenvalues,  the 
sphericity  test  is  such  a  procedure  [2].  Both  these  methods  are 
based  on  large  samples  and  Gaussian  signals,  otherwise  their  be¬ 
haviour  may  be  unpredictable  since  the  distribution  of  the  sample 
eigenvalues  can  be  sensitive  to  departures  from  Gaussianity  [3]. 

Here  a  multiple  hypothesis  procedure  is  used  to  compare  all 
pairwise  differences  of  the  eigenvalues  to  test  for  the  number  of 
sources.  A  similar  approach  has  demonstrated  significant  poten¬ 
tial  for  improved  performance  over  the  MDL  [4].  Though  concep¬ 
tually  similar  to  the  sphericity  test,  we  estimate  the  finite  sample 
distributions  of  the  test  statistic  by  resampling,  rather  than  using 
asymptotic  approximations.  In  these  cases  we  can  then  achieve 
improved  performance. 

Bias  in  the  sample  eigenvalues  has  a  significant  effect  on  this 
inference,  notably  when  population  eigenvalues  are  not  well  sep¬ 
arated,  as  for  small  samples  or  low  SNR.  A  bias  correction  based 
on  the  expectation  of  the  sample  eigenvalues  is  proposed.  Perfor¬ 
mance  is  similar  to  resampling  methods,  but  at  a  greatly  reduced 
computational  cost.  With  bias  estimates  incorporated  into  detec¬ 
tion  procedure  it  is  possible  to  obtain  an  improvement  in  perfor¬ 
mance  over  the  MDL  and  the  sphericity  test,  itself  a  general  im¬ 
provement  over  the  MDL. 

This  work  was  in  part  supported  by  the  Australian  Telecommunica¬ 
tions  Cooperative  Research  Centre  (AT-CRC). 


The  setting  for  the  source  detection  problem  is  as  follows,  n  snap¬ 
shots  of  i.i.d.  zero  mean  complex  data  are  received  from  a  p  ele¬ 
ment  array, 

x(t.)  =  As{t.)  +  v(t),  t  =  1,  ...,n 

where  A  is  the  p  x  q  array  steering  matrix,  s„  is  a  q  (q  <  p)  vector 
valued  white  source  signal  and  v„  is  noise  with  covariance  a1 1. 
Assuming  s(t)  and  v(t)  are  independent,  the  array  covariance  is 

R=  E  [x(f)x"(f)]  =  AR,AH  +a2 1 

where  R,  is  the  covariance  of  the  sources.  The  eigenvalues  of  R 
are  denoted 

Ai  >  •  •  ■  >  A (7  i>  A<7+i  =  •  ■  •  =  \j,  =  a  ,  (1) 

so  that  the  smallest  p-  q  population  eigenvalues  are  equal.  These 
will  be  referred  to  as  multiple  or  noise  eigenvalues,  the  others  as 
distinct  or  source  eigenvalues.  The  problem  is  then  one  of  deter¬ 
mining  the  multiplicity  of  the  smallest  sample  eigenvalues, 

U  >■■•>/,,>  0, 


of  the  sample  array  covariance, 


R  = 


]>>(f)x(f)H. 

t=i 


We  now  present  the  proposed  multiple  hypothesis  procedure  for 
source  detection. 


3.  APPLICATION  TO  SOURCE  DETECTION 

From  ( 1 )  it  follows  that  we  must  test  for  equality  of  eigenvalues,  by 
considering  all  possible  pairwise  comparisons  between  the  eigen¬ 
values  we  arrive  at  this  set  of  hypotheses, 

H ij  A,  =  Ay,  i  —  1,  •  •  • , P  U  3  ^  L 
Ky  :  A.  •/  A,. 

A  hypothesis  test  for  equality  of  the  smallest  p  —  k  eigenvalues  can 
be  obtained  by  combining  these  pairwise  comparisons  to  give  the 
new  hypotheses  H/,,  k  =  0, . . .  ,p  —  2, 

H*.  =  ClijHjj,  i  =  k  +  1, . . .  ,p  —  1,  j  >  i, 

K;  =  UijKjj. 
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Acceptance  of  all  pairwise  comparisons  in  H/,  implies  the  smallest 
p  —  k  eigenvalues  are  equal,  or  that  there  are  k  sources.  Given  p- 
values  for  the  pairwise  comparisons,  a  sequentially  rejective  Bon- 
feronni  (SRB)  procedure  [5]  is  employed  to  test  the  hypotheses  Hfc 
and  estimate  q  in  the  following  manner, 

1.  Set  k=0. 

2.  Test  Hfc. 

3.  If  H;.  is  accepted  then  set  q  =  k  and  stop. 

4.  If  Ha-  is  rejected  and  k  <  p  -  1  then  set  k  ->  k  +  1  and 
return  to  step  2.  Otherwise  set  q  =  p  -  1  and  stop. 

The  SRB  procedure  maintains  the  global  level  of  significance, 
ct,  defined  as  the  probability  that  at  least  one  of  the  hypotheses  is 
rejected,  given  all  are  true.  In  this  test  the  global  null  is  H0,  that 
all  eigenvalues  are  equal. 

P- values  for  hi ,  ,  are  found  using  the  bootstrap  [6].  This  re¬ 
sampling  technique  is  used  for  several  reasons,  it  avoids  the  need 
to  know  the  distribution  of  the  test  statistic  under  the  null  and  is 
valid  for  small  samples  and  non-Gaussian  data. 

These  advantages  are  quite  important  when  working  with  eigen¬ 
values  since  their  distribution  is  too  complex  for  general  use  [7], 
while  asymptotic  expansions  [8]  may  not  be  valid  for  the  small 
sample  sizes  considered.  Asymptotic  approximations  developed 
for  non-Gaussian  cases  require  knowledge  of  the  higher  order  mo¬ 
ments  of  the  data,  which  are  difficult  to  estimate  well  for  small 
sample  sizes  [3,  9]. 

The  basic  premise  of  the  proposed  test  is  that  differences  be¬ 
tween  noise  eigenvalues  are  small.  For  finite  samples  the  eigenval¬ 
ues  are  biased,  the  amount  of  bias  increasing  as  the  separation  of 
population  eigenvalues  decreases.  Thus  differences  between  mul¬ 
tiple  eigenvalues  will  be  shifted  away  from  zero.  To  correct  for 
this  shift  we  must  estimate  the  bias  of  all  eigenvalues.  We  now 
consider  several  methods  for  bias  estimation. 


4.  BIAS  ESTIMATION 


4.1.  Lawley’s  Expansion 

Lawley  developed  an  expression  for  the  expectation  of  the  distinct 
eigenvalues  by  considering  the  propagation  of  error  from  the  sam¬ 
ple  covariance  to  the  eigenvalues  for  Gaussian  data  [12],  From  this 
the  bias  in  U  is  estimated  as 


BzASlaw  (fi )  —  li 


(n  .  ^ '  k-lj  + 


n  l; 


,  (2) 


for  i  =  1, . . . , q,  where  a2,  the  population  multiple  eigenvalue,  is 
replaced  with  its  maximum  likelihood  estimate  under  Gaussianity, 


a 


2 


1 

p-q 


E  lJ- 


j=<?+i 


(3) 


Bias  in  the  distinct  sample  eigenvalues  is  of  order  0(rt~ 1 ) .  Though 
a  similar  expression  for  multiple  eigenvalues  does  not  exist,  exten¬ 
sive  simulations  have  shown  that  the  bias  is  of  order  0(n~1/2). 

After  applying  this  bias  correction,  the  distinct  eigenvalues 
have  a  bias  of  order  0(n~2),  while  a2  is  unbiased  under  Gaus¬ 
sianity.  Note  these  corrections  can  only  be  applied  if  q  is  known 
and  even  then  individual  multiple  eigenvalues  cannot  be  corrected. 


The  estimate  (2)  is  valid  when  the  difference  between  suc¬ 
cessive  distinct  eigenvalues  is  large  relative  to  the  standard  error, 
which  is  of  order  0(n~l/2).  If  this  condition  is  not  fulfilled  and 
A,  A,  for  i  /  j,  the  variance  of  this  estimator  increases  quickly. 
This  follows  intuitively  by  examining  the  denominator  in  the  terms 
of  the  summation  of  (2).  Similarly,  if  (2)  was  unknowingly  applied 
to  multiple  eigenvalues  the  results  will  be  unpredictable  as  the  as¬ 
sumption  of  well  separated  population  eigenvalues  is  invalid. 


4.2.  A  Robust  Bias  Estimate 

Based  on  Lawley 's  expansion  we  propose  a  bias  estimate  to  over¬ 
come  the  aforementioned  problems  by  taking  a  binomial  expan¬ 
sion  in  the  denominator  of  the  summand  of  (2)  and  truncating  to  a 
finite  number  of  terms.  For  simplicity,  assume  that  all  the  popula¬ 
tion  eigenvalues  are  distinct.  Then  the  bias  estimate  for  U  becomes 


Bias  lbe  (li)  = 


1  i .  / 1 

n  jz/zi  l3  2-^k— 0  yi 

_iy^p  l\'K  (B.\k 

n  j^i  Z^k=0  yij  J 


)‘ 


lj  <  U, 
Ij  >  li, 


for  some  suitable  K.  If  required,  the  upper  limit  on  the  outer  sum¬ 
mation  can  be  changed  to  q  and  the  term  corresponding  to  multiple 
eigenvalues  from  (2)  included.  Setting  K  to  a  moderate  value  will 
retain  the  bias  correction  properties  while  guarding  against  large 
increases  in  variance  when  the  population  eigenvalues  are  not  well 
separated  or  multiple  eigenvalues  are  present.  A  value  of  K  =  25 
was  found  to  be  acceptable.  Hence  BiasiBE  may  be  applied  with¬ 
out  any  knowledge  of  the  multiple  eigenvalues  by  assuming  q  to 
be  p  and  this  bias  estimate  can  be  applied  blindly  to  correct  all 
eigenvalues  irrespective  of  multiplicity  issues. 

An  example  with  multivariate  Gaussian  data  is  shown  in  Fig¬ 
ures  1  and  2  where  the  largest  sample  eigenvalue  is  considered 
and  both  corrections  are  applied.  The  data  has  a  diagonal  co- 
variance  matrix  with  population  eigenvalues  (1.15, 1.1, 1.05, 1)'. 
While  the  mean  value  of  the  corrected  eigenvalues  are  very  similar 
there  is  a  notable  decrease  in  the  variance  when  using  Biasw  For 
small  sample  sizes  the  decrease  is  significant  as  the  separation  of 
population  eigenvalues  is  of  the  same  order  as  the  standard  error. 


Figure  1 :  Mean  of  the  largest  sample  eigenvalue  with  no  bias  esti¬ 
mation  (— ),  Biasuw  ( - )  and  Bias  lbe  (••■),  versus  sample  size 

for  multivariate  Gaussian  data,  p  —  4,  with  diagonal  covariance 
matrix  and  population  eigenvalues  (1.15, 1.1, 1.05, 1)'. 
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Figure  2:  Standard  Deviation  of  the  largest  sample  eigenvalue  with 
no  bias  estimation  (— ),  Biasi Aw  ( — )  and  Bias LBe  (•  ••).  versus 
sample  size  for  the  same  scenario  as  Figure  1 . 


4.3.  Resampling  Methods 

Resampling  methods  are  valid  in  small  sample  scenarios,  for  non- 
Gaussian  data  and  may  be  applied  blindly,  that  is,  without  having 
to  know  whether  multiple  eigenvalues  are  present.  The  two  meth¬ 
ods  considered  are  the  jackknife  [6]  and  subsampling  [10], 

4.4.  Jackknife  Bias  Estimation 

In  this  jackknife  procedure  a  jackknife  data  set  is  created  by  delet¬ 
ing  a  single  sample  from  the  original  data  set.  Given  a  data  set  of 
size  n  there  are  n  possible  unique  jackknife  data  sets.  For  each  of 
these  we  recompute  the  statistic  of  interest  yielding  the  estimates 
/*(&),&  =  1, . . . ,  n.  The  jackknife  estimate  of  bias  in  /,  is 

BiasK*,{li)  =  (n  -  1)  f^]TJZ*(Z>)  -  Z,^  , 

where  Z,  was  estimated  from  the  entire  sample. 

4.5.  Subsampling  Bias  Estimation 

Subsampling  is  a  generalisation  of  the  jackknife,  where  instead  of 
methodically  removing  one  sample  at  a  time,  d  samples  are  re¬ 
moved  to  give  a  subsample  of  size  s  =  n  —  d.  The  number  of  pos¬ 
sible  subsamples,  n\/s\(n  —  s)!,  may  be  very  large,  so  a  smaller 
number  of  B  subsamples  are  chosen  at  random. 

As  the  subsamples  are  smaller  in  size  than  the  original  data  set, 
rescaling  is  required  to  correct  the  subsampling  estimate.  Assume 
that  eigenvalue  bias  is  proportional  to  l/r„,  t„  being  a  function 
of  n.  Then  the  bias  estimate  from  a  subsample  of  size  b  is  pro¬ 
portional  to  1  /n,  to  apply  this  to  the  original  statistic  rescaling  is 
required.  While  the  function  t„  is  problem  dependent,  it  is  usually 
of  the  form  ns ,  f3  €  (0, 1).  The  subsampling  estimate  of  bias  is 

BiasSUB(li)  ~  —  ( 4  (fc)  “  *>' 

r"  \B  fc! 

where  r  =  sn/(n  —  s).  If  all  possible  subsamples  are  used  then 
tire  approximation  is  replaced  with  equality.  The  reason  t>  is  used 


instead  of  n  is  because  resampling  is  performed  without  replace¬ 
ment  from  a  finite  population,  as  opposed  to  with  replacement 
from  an  infinite  population,  which  is  assumed  with  77,  [11].  As 
an  approximation  B  «  1  for  distinct  eigenvalues  and  ft  «  0.5 
for  multiple  eigenvalues.  Since  the  presence  of  either  is  unknown 
we  must  estimate  B  for  each  eigenvalue,  however,  a  simple  way  to 
avoid  this  is  to  set  s  =  n/2 ,  so  that  r,  /r„  -  1,  independent  of  /?. 

Note  the  jackknife  and  subsampling  are  valid  in  a  wider  va¬ 
riety  of  situations  than  other  resampling  techniques  such  as  the 
bootstrap  [6],  with  subsampling  being  the  most  widely  applicable. 

An  example  with  multivariate  Gaussian  data  is  shown  in  Fig¬ 
ures  3  and  4  for  the  largest  sample  eigenvalue  where  both  BiasKK 
and  BinssuB  are  applied.  The  data  has  diagonal  covariance  matrix 
with  population  eigenvalues  (4,3,2, 1)'.  For  Biassm,  B  =  100 
subsamples  of  size  s  =  n/2  were  chosen  at  random.  Both  meth¬ 
ods  behave  very  similarly  and  though  Bias\. be  is  not  shown  here, 
it  too  produces  very  similar  results  in  terms  of  the  average  bias 
estimate.  Further  experimentation  has  shown  the  subsampling  es¬ 
timator  tends  to  have  a  slightly  lower  MSE. 


Figure  3:  Mean  of  the  largest  sample  eigenvalue  with  no  bias  esti¬ 
mation  (-),  Bias jck  ( — )  and  Biassun  (•••).  versus  sample  size 
for  multivariate  Gaussian  data,  p  =  4,  with  diagonal  covariance 
matrix  and  population  eigenvalues  (4, 3,  2, 1)'. 


5.  SIMULATIONS 

Figure  5  shows  detection  rates  for  a  p  =  4  element  array  with 
<7  =  3  sources  at  10°,  30°  and  50°  from  broadside  at  SNR's  of  —2, 
2  and  6  dB  respectively.  All  signals  are  Gaussian  and  the  global 
level  for  both  the  resampling  and  sphericity  tests  was  set  to  a  = 
0.02.  Parameters  for  resampling  bias  correction  were  the  same 
as  those  used  in  4.3,  for  the  bootstrap  B  =  200  resamples  were 
taken.  In  this  scenario  there  is  an  improvement  in  performance 
over  both  the  MDL  and  sphericity  test,  most  noticeable  for  small 
sample  size.  In  Figure  6  we  have  a  non-Gaussian  scenario  where 
the  source  is  Laplacian  and  the  SNR  of  the  second  source  is  varied 
for  a  sample  size  of  n  =  50.  Again,  there  is  an  improvement 
over  existing  methods.  Similar  results  are  obtained  for  other  non- 
Gaussian  distributions,  such  as  Gaussian  mixtures. 

Additional  scenarios  have  shown  that  there  is  an  improvement 
over  the  MDL  in  nearly  all  cases  and  a  comparable  or  possibly 
superior  performance  to  the  sphericity  test.  This  has  to  be  weighed 
against  the  increase  in  computational  complexity  which  increases 
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Figure  4:  Standard  Deviation  of  the  largest  sample  eigenvalue  with 

no  bias  estimation  (— ),  BiasJCK  ( - )  and  Bias  sub  (-••)»  versus 

sample  size  for  the  same  scenario  as  Figure  3. 


Figure  5:  Detection  rates  versus  sample  size.  MDL  (— ),  sphericity 
test  ( — ),  bootstrap  methods  using  bias  correction,  Biasiet  (•©•), 
Biasias  (•><•)<  BiassuB  (•+•)• 


in  direct  proportion  to  the  number  of  times  the  data  is  resampled. 
Since  Bias  lbe  does  not  use  resampling  it  involves  less  computation 
than  resampling  methods.  For  the  B  =  100  resamples  used  here 
for  bias  estimation  this  represents  a  significant  saving. 

6.  CONCLUSION 

The  source  detection  problem  was  approached  as  a  multiple  hy¬ 
pothesis  test  for  equality  of  eigenvalues.  It  was  shown  that  for 
the  cases  of  interest,  such  as  small  sample  size,  bias  in  the  sample 
eigenvalues  is  non-negligible  and  should  be  corrected  for  when 
carrying  out  the  test.  An  improved  bias  estimate  was  proposed 
which  overcomes  the  need  to  know  the  multiplicity  of  the  eigen¬ 
values,  performing  well  in  spite  of  their  presence.  It  is  less  compu¬ 
tationally  intensive  than  resampling  methods  while  achieving  sim¬ 
ilar  performance.  Results  show  that  the  proposed  procedure  can 
yield  improved  performance  compared  to  the  MDL  and  sphericity 
test  in  the  non-Gaussian  and  small  sample  cases. 


Figure  6:  Detection  rates  versus  SNR  (dB)  for  Laplace  sources 
and  non-Gaussian  noise.  MDL  (— ),  sphericity  test  ( — )  and  boot¬ 
strap  methods  using  bias  correction,  BiasLB e  (•©•),  BiasKK  (-x  ), 
Bios  sub  (— t— ). 
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ABSTRACT 

We  consider  the  general  signal-processing  problem  of  learning 
about  certain  attributes  of  interest  from  measurements.  These 
attributes,  which  may  be  time-varying  (dynamic)  or  time- 
invariant  (static),  can  be  anything  that  are  relevant  to  the  physical 
processes  that  produce  the  measurements.  In  statistical  signal 
processing,  imperfections  or  uncertainties  in  the  physical 
processes  are  described  using  probability  models,  and  the 
complete  probahilistic  solution  to  the  problem  is  given  by  the 
distribution  of  the  attributes  conditioned  on  all  available 
measurements  (the  posterior  distribution). 

We  describe  an  algorithm  for  computing  this  solution, 
especially  in  situations  with  many  measurements  or  low  signal- 
to-noise  ratios.  The  algorithm  combines  sequential  importance 
sampling  (SIS)  and  Markov  chain  Monte  Carlo  (MCMC)  so  as  to 
achieve  computational  efficiency  and  stability.  MCMC  is 
performed  sequentially  for  batches  of  measurements  whose  sizes 
are  determined  adaptively,  hence  the  name  sequential  MCMC 
filter.  For  measurements  within  a  batch,  SIS  is  performed.  Tlius. 
bigger  batch  sizes  mean  that  MCMC  is  performed  less  frequently. 
SIS  is  computationally  efficient  but  with  a  finite  Monte  Carlo 
sample  size,  stability  is  not  guaranteed  indefinitely.  MCMC  is 
therefore  needed  from  time  to  time  to  “refresh"  the  Monte  Carlo 
sample,  eliminating  any  errors  that  may  have  accumulated  from 
the  SIS  steps.  When  MCMC  is  performed,  it  does  not  start  from 
scratch  but  uses  the  most  recent  Monte  Carlo  sample  from  SIS  to 
construct  the  proposal  distribution.  Adaptive  batch  sizing  is 
based  on  a  Kullback-Leibler  distance  that  is  easy  to  compute.  By 
extending  the  algorithm  to  multiple  models,  the  sequential 
MCMC  filter  can  deal  simultaneously  with  the  dual  pillars  of 
statistical  signal  processing,  namely  detection  (more  generally, 
model  selection)  and  parameter  estimation. 

We  discuss  general  uses  of  the  sequential  MCMC  filter,  and 
demonstrate  its  use  for  simultaneous  weak  signal  detection  and 
parameter  estimation  in  a  real-data  experiment. 

1.  INTRODUCTION 

A  goal  of  signal  processing  is  to  learn  about  certain  attributes  of 
interest  from  measurements.  These  attributes  can  be  anything  that 
is  relevant  to  the  physical  processes  that  produce  the 
measurements.  For  example,  they  may  be  signal  attributes  such  as 
amplitude,  frequency  or  phase;  or  noise  attributes  such  as  noise 
power;  or  image  attributes  such  as  intensities  or  edges;  and  so  on. 
In  statistical  signal  processing,  imperfections  or  uncertainties  in 
tlie  physical  processes  are  described  using  probability  models. 
Whilst  the  complete  description  of  probabilistic  objects  is 


provided  by  distributions,  statistical  signal  processing  solutions 
have  predominantly  been  moment-based.  Typically,  these 
solutions  make  simplifying  approximations,  such  as  linearization 
and  the  use  of  convenient  distributions,  so  as  to  be  computable 
under  various  hardware  constraints.  Recently,  interest  in 
distribution-based  statistical  signal  processing  solutions  has 
grown  due  to  rapid  advances  in  computer  technology. 

Together  with  the  shift  in  focus  from  moments  to 
distributions  is  a  shift  towards  the  Bayesian  paradigm.  This  is 
natural  because  the  Bayesian  framework  is  the  mathematically 
consistent  and  coherent  framework  for  updating  distributions. 
This  theoretical  impetus  is  steadily  being  reinforced  by  the 
increasing  awareness  of  the  tangible  advantages  of  Bayesian 
techniques.  For  example,  the  ease  with  which  prior  information 
and  domain  knowledge  can  be  incorporated;  the  finite-sample 
optimality  properties  of  Bayesian  solutions;  the  relevance  of  the 
solutions  to  the  problems  at  hand  (as  opposed  to  “in  the  long  run” 
or  "on  the  average”);  and  the  built-in  Ockham  effect  (which 
penalizes  model  complexity  and  hence  prevents  model  over¬ 
fitting)  offered  by  Bayesian  model  selection.  Furthermore,  hi 
certain  applications,  the  Bayesian  framework  can  unify  signal¬ 
processing  tasks  that  are  conventionally  regarded  as  separate. 
This  unifying  feature  of  the  Bayesian  framework  may  be  referred 
to  as  simultaneity. 

Bayesian  ideas  are  not  new  but  until  about  a  decade  ago,  they 
remained  largely  of  academic  interest  due  to  the  difficulty  of 
computing  Bayesian  solutions  for  real-world  problems.  Such 
problems  often  involve  complex  models  that  have  been 
painstakingly  derived  by  the  domain  experts.  Today,  the  tables 
seem  to  have  turned.  In  fact,  the  Bayesian  approach  is  emerging 
as  the  one  that  can  effectively  handle  complex  models.  Not  only 
are  these  complex  models  no  longer  a  hindrance,  the  Bayesian 
approach  actually  preserves  them  (no  model  simplification  is 
made)  and  uses  them  to  advantage  (valuable  domain  knowledge). 
All  this  has  come  about  because  the  availability  of  powerful 
computers  has  encouraged  research  into  computer-intensive 
methods  for  Bayesian  computations.  These  are  essentially  Monte 
Carlo  or  sample-based  methods  that  represent  a  distribution  of 
interest  by  a  sufficiently  large  number  of  computer-generated 
sample  points.  Of  these  methods,  the  one  with  the  greatest 
impact,  making  Bayesian  computations  with  complex  models 
possible,  is  Markov  chain  Monte  Carlo  (MCMC)  (see  [1]  for  a 
recent  review).  James  Berger,  a  renowned  and  respected 
statistician,  goes  so  far  as  to  say  [2],  “The  Bayesian  'machine,' 
together  with  MCMC.  is  arguably  the  most  powerful  mechanism 
ever  created  for  processing  data  and  knowledge.”  In  signal 
processing,  as  in  other  application  domains,  we  are  learning  how 
to  utilize  the  full  power  of  this  “mechanism”. 
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2.  THE  SEQUENTIAL  MCMC  FILTER 

We  begin  with  data  in  the  form  of  a  sequence  of  measurements, 
xIfx2,...e  ,  where  dx  is  the  measurement  dimension.  Our 

goal  is  to  use  the  data  to  estimate,  at  each  time-step,  certain 
unknown  attributes  of  interest,  which  are  pertinent  to  the  physical 
processes  that  produce  the  measurements.  Some  of  these 
attributes  may  be  time-invariant.  We  group  these  together  and 

represent  them  by  the  parameter  vector,  with 

dimension  d  .  There  may  also  be  attributes  that  are  time- 
varying.  We  represent  these  by  a  sequence  of  state  vectors, 
01,02,...e9t‘,s,  where  de  is  the  state  dimension.  We  define  the 
cumulative  state  at  time-step  k  to  be  0*  =  (\|/,0,,02,...,0t), 
with  0O  =  V|t .  By  letting  X  k  —  (x,  ,...,xk  )  ,  our  problem  then  is 
to  estimate  using  Xk  ,  for  k  =  1,2,...  .  For  each  time-step, 
the  complete  probabilistic  solution  is  given  by  the  joint 
distribution  of  0t  conditionally  given  X k  . 

Henceforth,  we  assume  that  all  distributions  are  continuous 
so  that  their  associated  densities  exist.  In  the  discrete  case,  mass 
functions  should  replace  densities  and  summations  should  replace 
integrals  wherever  appropriate.  For  generic  random  vectors  y  and 
z,  we  use p(y)  to  denote  the  density  of y,  and  let  p(y\z)  denote 
the  conditional  density  of  y  given  z.  With  this  notation,  the 
solutions  that  we  seek  can  be  written  as  p(0,|A'1 ) , 

p(02 \x2 

Formally,  using  Bayes’  theorem,  we  have 

P{Qk\xk)~  /K*i|0*,^-,)/K0*K1) 

=  p(xk  |0t  )/?(0j|©t_]  )p(Qk_] \X  k_x ), 

which  indicates  that  the  desired  solution  for  the  current  time-step, 
k,  can  be  obtained  from  the  solution  for  the  preceding  time-step,  k 
-  1,  by  incorporating  new  information  brought  in  by  the  current 
measurement  through  the  likelihood,  p(xk  |©t ,  Xk  ] ) .  We  refer 

to  this  sequential  updating  formula  as  the  general  Bayesian  filter 
(GBF).  An  effective  way  to  implement  the  GBF  is  to  use 
adequately  large  Monte  Carlo  samples  (random  or  weighted)  to 
represent  distributions.  This  approach  is  known  as  Monte  Carlo 
filtering  or  particle  filtering  or  sequential  Monte  Carlo.  The 
ability  of  a  sufficiently  large  sample  to  provide  an  arbitrarily 
close  estimate  of  a  distribution  is  established  by  the  Glivenko- 
Cantelli  Theorem,  which  states  that  the  empirical  distribution 
function  converges  almost  surely  and  uniformly  to  the  true 
underlying  distribution  function  as  the  sample  size  increases  (see, 
for  example,  [3]).  Also,  convergence  of  sample  averages  of 
integrable  functions  of  the  sample  points  to  their  respective 
expectations  follows  from  the  Laws  of  Large  Numbers  [3],  In 
many  ways,  a  Monte  Carlo  sample  representing  p(Qk\Xk ) 

makes  it  easier  to  conduct  inference  with  /?(©j.|yT(t ) . 

To  illustrate  the  Monte  Carlo  filtering  idea,  suppose  we  have 
a  weighted  sample  of  size  n,  0t_, 0Wll ,  with  weights 

„  ,  representing  p(Qk_]\Xk^ ) .  We  denote  such  a 


weighted  sample  by  (©*_,,  ,C0*_U .  Then  (1) 
suggests  that  one  way  to  get  a  weighted  sample, 

that  represents  p(Qk\Xk)  is  to 

generate  Qt  J  from  p(Qk]fdk_kJ,Xk_{ ) .  augment  it  to  Qk_lj  to 
form  0t  j  ,  i.e. 

e*.,  ©*,,  =(©*_,,,  A, ,),  (2) 

and  then  compute  its  updated  weight  by 

“  P{Xk  I0*,;  .  Xk-X  )<*>*-!,,  •  (3) 

We  shall  refer  to  this  method  of  obtaining  the  desired  Monte 
Carlo  sample  as  sequential  importance  sampling  (SIS)  [4].  In 
practice,  SIS  is  easy  to  use  because  p{Qk\Ok_x,Xk_x)  and 

P(xk\Qk  ,Xk_^ )  are  usually  readily  available. 

A  problem  that  arises  with  SIS  is  that  with  a  finite  sample,  the 
weights  become  increasingly  skewed  over  time,  adversely 
affecting  the  sample’s  ability  to  adequately  represent  the 
distribution.  This  phenomenon  is  known  as  sample  degeneration 
and  various  schemes  have  been  suggested  to  mitigate  it.  In  [4], 
the  authors  propose  a  general  framework  that  unifies  many 
existing  schemes.  They  also  suggest  a  generic  Monte  Carlo 
filtering  algorithm  that  first  checks  the  skewness  of  the  weights 
and  then  performs  SIS  if  they  are  not  too  skewed,  but  otherwise 
performs  SIS  with  resampling  to  counter  sample  degeneration. 
The  skewness  check  is  based  on  an  “effective  sample  size”,  which 
is  computed  from  the  coefficient  of  variation  of  the  weights  [5], 
With  a  finite  Monte  Carlo  sample  size,  the  generic  algorithm  (and 
hence  all  of  its  particular  realizations  as  well)  has  been  shown  to 
delay  degeneration  but  there  is  no  guarantee  that  the  problem  is 
resolved  entirely.  We  have  not  seen  any  demonstration  of  its 
stability  for  long  measurement  sequences  or  for  low  SNR, 
situations  that  are  frequently  encountered  in  signal  processing. 

Our  sequential  MCMC  filter  has  a  somewhat  similar  generic 
structure  but  the  details  differ.  We  perform  SIS  and  check 
whether  the  resulting  Monte  Carlo  sample  provides  an  adequate 
representation  of  the  distribution  of  interest.  If  it  does,  we 
proceed  with  SIS  for  the  next  time-step;  otherwise,  we  perform 
MCMC  with  the  desired  distribution  as  target  distribution  and 
with  a  proposal  distribution  that  is  constructed  from  the  Monte 
Carlo  sample  produced  by  SIS.  Whilst  the  generic  algorithm  in 
[4]  counters  degeneration  by  resampling,  we  use  MCMC  to  avoid 
the  drawbacks  of  resampling  such  as  increase  in  random  variation 
in  the  resulting  sample  and  decrease  in  diversity  of  the  sample 
points.  The  need  for  a  full  MCMC  from  time  to  time  has  been 
alluded  to  in  [6]  -  it  “refreshes”  the  Monte  Carlo  sample  and 
removes  any  approximation  errors  that  may  have  accumulated 
from  the  SIS  steps  due  to  the  finite  sample  size.  Consequently, 
the  sequential  MCMC  filter  is  guaranteed  to  be  stable  as  long  as 
there  are  enough  resources  to  perform  the  MCMC  properly. 
Unlike  static  MCMC  schemes  that  perform  MCMC  from  scratch 
at  each  and  every  time-step  [7],  we  do  not  need  MCMC  at  every 
time-step  and  we  do  not  start  MCMC  from  scratch  but  use  the 
most  recent  Monte  Carlo  sample  from  SIS  to  construct  the 
proposal  distribution. 

To  summarize,  our  sequential  MCMC  filter  has  the  following 
steps: 
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1.  Start  of  time-step  k+  1 :  we  have  (0*  , ,  60*  ,  ) . 

(©*.„,  )  fronl  P(Qk\Xk)- 


Set  m  =  1 . 


SIS:  Obtain  (0*+„, , 

’  ®k+m.  1 

),...,(0*+m  „(  .G)*^  )  by 

®k+m.j  ' 

®*+ro-L 

®  k+m,  j 

= (©*+„,- 

1,  /  ,  ^  k+m 

,)• 

^ k+m.j 

“  />(**+!• 

,...*Xk+m 

|0*+m  J  ’  X k  )<j)i  (  . 

for  7=1,. 

...,«*. 

if  (©*«.., 

■■■,(©*„,.„,  )  adequately 

represent  p(0*+„,  \X 

k+m  )  ’ 

Set  m  =  m  +  1.  Go  to  Step  2. 

Else 

(a)  Construct  p(Qk+m \Xk+m)  using 

(®  r+m.l  ’  ®*+m. I  ) . (®  *+m.n,  '  ®*+m.n,  )  • 

(b) MCMC:  Obtain  (0*+ra.,,O)*+ml) . 

(©*+„,.„ . )  by  MCMC  ^b 

/?(©*+m|^*+„,)  as  target  density  and 
P(0i+m|^*+m)  as  proposal  density. 

(c)  Set  £  =  &  +  ni.  Go  to  Step  1 . 

Notice  that  in  general  the  size  of  the  Monte  Carlo  sample  need 
not  remain  the  same,  hence  the  use  of  nk  and  nk+m  .  For 

checking  whether  (©*+„,., ).-,(©*+«.„,  ) 

adequately  represent  p(0*+m|^f*+m)  in  Step  3.  we  actually 
measure  how  well  p(Qk+„\Xk)  “predicts"  /?(©*+„, \Xk+m)  by 
computing  the  Kullback-Leibler  distance  between  their  two 
respective  representative  samples.  (0*+m  , .  CO* ,  ) . 

(©t+m.H,  ’ h*  )  and  (©*+nl  |  | ) . •  (0*+n/  i^k+m.nt  )  * 

K(W*+„„(0*)  =  X«*+mJ(logC0*+m  ,  —  log  (0*  y  ) .  (4) 

i= i 

We  refer  to  the  number  m  when  MCMC  is  required  as  the  batch 
size.  It  is  determined  adaptively  by  specifying  a  threshold  for 
K(C0*+„,,C0*)  .  Bigger  batches  mean  that  MCMC  is  performed 

less  frequently.  Thus,  MCMC  is  performed  sequentially  for 
batches  of  measurements,  hence  the  name  sequential  MCMC 
filter.  Within  a  batch.  SIS  is  performed. 

Finally,  any  convenient  MCMC  procedure  can  be  used  in 
Step  3(b). 

3.  SIMULTANEOUS  DETECTION  AND 
ESTIMATION 

We  conducted  an  experiment  in  an  acoustic  anechoic  chamber  to 
record  a  weak  ultrasonic  acoustic  chirp,  and  then  processed  it 
with  our  sequential  MCMC  filter.  A  linear  chirp  with  chirp  rate 


of  170.75  kHz/s  was  generated  and  transmitted  through  an 
electrostatic  transducer.  A  microphone,  placed  some  distance 
away  (not  exceeding  10  m)  from  the  transmitter,  received  the 
acoustic  chirp  signal  and  recorded  it  at  a  sampling  rate  of  250 
kHz.  The  noise  in  the  recorded  data  comprised  ambient  noise  and 
circuit  noise.  We  suspected  that  the  latter  was  dominant  because 
acoustic  noise  in  the  anechoic  chamber  was  very  low.  We 
analysed  the  measurement  noise  and  found  the  Gaussian  model  to 
be  adequate. 

We  knew  that  the  received  chirp  was  weak  but  it  was  not  easy 
to  estimate  its  SNR.  Only  after  processing  by  the  sequential 
MCMC  filter  did  we  realise  that  the  SNR  was  about  -14  dB.  The 
raw  recorded  data  required  some  pre-processing  (scaling  and 
bandpass  filtering  with  pass  band  from  4  to  124  kHz)  to  remove 
certain  hardware  artefacts.  After  all  these  pre-processing,  the  only 
parameter  of  the  real  signal  that  is  known  exactly  is  the  chirp  rate. 

To  perform  simultaneous  detection  and  parameter  estimation, 
we  used  the  following  models  for  the  sequential  MCMC  filter: 

w,  ~N(0.ol).  (5) 

H2  :x,  =  asin[27t((3r  +  y/ +  <)>)] +  n’, ,  w,~N(0,al).  (6) 

Here,  a  is  the  amplitude.  (3  is  the  chirp  rate,  and  y  and  <|>  are 
parameters  that  have  the  same  dimension  as  frequency  and  phase 
shift  respectively.  The  parameter  vectors  for  the  two  models  are 

V(,)=«0.  (7) 

\|/<2)  =(a,p,Y,<j>,c;;).  (8) 

An  alternative  parameterization  for  model  2  is  to  use  maximum 
frequency  and  minimum  frequency.  ymax  and  Ymin .  instead  of  [3 
and  y.  since  it  can  be  shown  that 

Y=ymin-  (9) 

Q  Yma.x  Ymin  (10) 

P  2  T  ' 

where  T.  the  duration  of  the  data  to  be  processed,  is  known.  So 
we  have 

i|/(2)  =(ot.YmK,Ymi„.<t>.CJ^) .  (11) 

We  started  with  equal  model  probabilities  of  1/2  and  with 
1000  sample  points,  assigning  500  points  to  each  model.  Setting 

the  maximum  possible  amplitude  value  to  -Jl  V  (corresponding 
to  signal-to-noise  ratio  of  0  dB),  the  prior  for  amplitude,  a.  for  H? 
was  1/(0.  V? )  (in  V)  to  reflect  the  lack  of  prior  information. 
Since  the  sampling  rate  was  250  kHz.  the  maximum 
instantaneous  frequency  permissible  to  avoid  aliasing  was  125 
kHz.  To  reflect  the  lack  of  prior  information,  the  prior 
distribution  for  the  maximum  frequency.  Y„m  >  was  chosen  to  be 

l/( 0.  125)  (in  kHz).  The  prior  for  minimum  frequency.  Ymin  ■ was 
modeled  as  t/(0,  Yml, )  •  and  the  prior  for  <|>  was  chosen  to  be 

1/(0.  2n).  Lastly,  the  prior  for  noise  power,  a2  .  in  both  models 

was  modeled  using  a  lognormal  distribution  with  a  median  of  1 
and  standard  deviation  of  0.2  to  represent  a  noise  power  of 
around  1  W. 

We  processed  the  acoustic  chirp  data  with  the  sequential 
MCMC  filter  and  was  able  to  detect  the  chirp  after  1195 
measurements.  This  is  shown  in  Figure  1,  which  shows  the 
posterior  probability  that  a  chirp  is  present  reaching  1  at 
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measurement  1 195.  In  comparison,  we  were  not  able  to  detect  the 
chirp  with  the  short-time  Fourier  transform  (STFT)  with  the  same 
1195  measurements.  However,  techniques  that  are  specially 
designed  to  detect  linear  chirps  may  fare  better  than  the  STFT. 
For  example,  the  Radon  ambiguity  transform  (RAT)  [8]  is  able  to 
detect  the  acoustic  chirp,  and  provides  an  estimate  of  chirp  rate 
only.  In  contrast,  the  sequential  MCMC  frlter  provides  estimates 
of  chirp  rate  (in  terms  of  minimum  frequency  and  maximum 
frequency),  amplitude,  initial  phase  and  noise  power.  These  are 
shown  in  Figure  2  as  marginal  medians  and  quartiles  after 
processing  2048  measurements. 


P«Wbil*>tH  1) 


Figure  1.  Probability  of  the  chirp-plus-noise  model  in  the 
acoustic  chirp  experiment. 
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Figure  2.  Marginal  medians  and  quartiles  of  parameters  in  the 
chirp-plus-noise  model  from  the  sequential  MCMC  filter  for 
2048  measurements  from  the  acoustic  chirp  experiment. 


4.  CONCLUSION 

We  have  described  an  algorithm  for  Bayesian  computations  that 
combines  SIS  with  MCMC.  The  algorithm  performs  MCMC 
sequentially  on  batches  of  measurements  whose  sizes  are 
determined  adaptively,  hence  the  name  sequential  MCMC  filter. 
Within  a  batch  of  measurement,  SIS  is  used  for  computational 
efficiency.  MCMC  is  needed  from  time  to  time  to  “refresh”  the 
Monte  Carlo  sample  and  to  remove  any  errors  that  may  have 
accumulated  from  SIS  due  to  the  finite  sample  size.  When 
MCMC  is  performed,  it  does  not  start  from  scratch  hut  constructs 
its  proposal  distribution  from  the  most  recent  Monte  Carlo 
sample  produced  by  SIS.  Adaptive  batch  sizing  is  based  on  an 
easy-to-compute  Kullback-Leibler  distance.  Bigger  batches  mean 
that  MCMC  is  needed  less  often.  For  parameter-only  problems 
that  we  have  worked  on,  we  have  observed  a  trend  of  increasing 
batch  size  over  time. 

By  incorporating  multiple  models,  we  have  demonstrated  the 
filter’s  ability  to  perform  simultaneous  model  selection  and 
parameter  estimation.  In  the  real-data  experiment  with  the 
acoustic  chirp,  some  degree  of  model  mismatch  is  unavoidable, 
but  the  sequential  MCMC  filter  performs  reasonably  well 
suggesting  tolerance  to  model  mismatch. 

With  today’s  computers,  the  sequential  MCMC  filter  is 
computationally  feasible  for  parameter-only  problems.  For 
problems  with  dynamic  states,  the  growing  dimension  of  the 
cumulative  state  is  a  severe  obstacle  to  implementing  the 
algorithm.  We  are  exploring  ways  to  overcome  this,  including  a 
compression  scheme  that  looks  promising. 
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ABSTRACT 

A  framework  for  positioning,  navigation  and  tracking  prob¬ 
lems  using  particle  filters  (recursive  Monte  Carlo  methods) 
is  developed.  Automotive  and  airbom  applications,  approached 
in  this  framework,  have  proven  a  numerical  advantage  over 
classical  Kalman  filter  based  algorithms.  Here  the  use  of 
non-linear  measurement  models  and  non-Gaussian  measure¬ 
ment  noise  is  the  main  explanation  for  the  improvement  in 
accuracy,  and  models  for  relevant  sensors  are  surveyed. 

1.  INTRODUCTION 

Recursive  implementations  of  Monte  Carlo  based  statisti¬ 
cal  signal  processing  [6]  are  known  as  particle  filters,  see 
[4,  3].  These  may  be  a  serious  alternative  for  real-time  ap¬ 
plications  classically  approached  by  model-based  Kalman 
filter  techniques  [9,  8],  The  more  non-linear  model,  or  the 
more  non-Gaussian  noise,  the  more  potential  particle  filters 
have,  especially  in  applications  where  computational  power 
is  rather  cheap  and  the  sampling  rate  slow.  The  research  has 
since  the  paper  [7]  steadily  intensified. 

The  paper  describes  a  general  framework  for  a  number 
of  applications,  where  we  have  implemented  the  particle  fil¬ 
ter.  The  outline  is  as  follows.  We  will  start  with  a  gen¬ 
eral  framework  of  models  covering  all  of  our  applications  in 
Section  2,  and  in  Section  relevant  sensors  and  their  mesaure- 
ment  models  are  surveyed.  Section  4  describes  how  a  num¬ 
ber  of  applications  we  have  studied  fit  into  the  framework, 
and  the  actual  sensors  we  use.  Conclusions,  discussions  and 
open  questions  of  general  interest  are  discussed  in  Section 
5. 

2.  MODELS 

Central  for  all  navigation  and  tracking  applications  is  the 
motion  model  to  which  various  kind  of  model  based  filters 
can  be  applied.  Models  which  are  linear  in  the  state  dynam¬ 
ics  and  non-linear  in  the  measurements  and  with  additive 

The  current  affiliations,  of  all  but  the  first  author,  are.  in  order  of  ap¬ 
pearance,  Ericsson  Radio.  SaabTech  Systems,  NIRA  Dynamics.  Volvo  CC, 
Saab  Aircraft,  Saab  Bofors  Dynamics,  respectively. 


noise  are  considered: 

xt+i  —  Axt  +  Buut  +  Bu,wt ,  ( la) 

yt  =  h(xt)+et.  (lb) 


The  signals  of  primary  interest  in  navigation  and  tracking 
applications  are  related  to  position,  velocity  and  accelera¬ 
tion  as  summarized  in  Table  1.  Depending  on  whether  the 


Object 

Position 

Velocity 

Acceleration 

Own 

Other 

p(‘> 

p(2) 

u<2) 

<5o(1)  acc.  bias 

Table  1.  Interesting  signals  in  navigation  and  tracking  ap¬ 
plications.  The  indexes  (1)  and  (2)  indicate  signals  related 
to  one's  own  and  another  platform,  respectively.  All  quan¬ 
tities  can  belong  to  either  one,  two  or  three-dimensional 
spaces,  depending  on  the  application. 

signals  are  measureable  or  not,  they  may  be  components  of 
either  the  state  vector  xt  or  the  input  signal  ut. 

Motion  models  (la)  are  thoroughly  discussed  in  litera¬ 
ture,  see  e.g.  [1,  8].  Note  that  the  same  kind  of  model  can 
be  used  in  all  applications  for  both  navigation  and  track¬ 
ing.  The  main  difference  between  the  applications  lies  in 
the  availability  of  measurements.  Section  3  provides  an  ex¬ 
tensive  list  of  possible  measurement  equations  (lb),  that  can 
be  combined  arbitrarily. 

3.  MEASUREMENT  EQUATIONS 

The  main  difference  between  the  considered  applications  is 
the  measurements  available.  Basically,  the  measurements 
are  related  to  the  positions  of  one's  own  platform  and  of 

the  other  object  p(2k  Therefore,  the  measurement  equations 
can  be  categorized  as  depending  on  p(  l  >  only,  or  depending 
on  both  p ^  and  j/2k 

yt(1)  =  /»{1)(Pt1))  +  ei1)  (2a) 

»t(2)  =  fc(2)(pi1),pi2))  +  ei2),  (2b) 
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where  the  measurement  noise  contributions  and  e^'  are 
characterized  by  their  distributions.  If  not  explicitely  men¬ 
tioned,  a  Gaussian  distribution  is  used. 

In  the  studied  applications,  measurements  from  at  least 
one  of  the  categories  above  are  available.  It  is  important  to 
note,  that  any  combination  of  the  sensors  are  possible.  The 
presented  applications  are  just  a  few  examples. 


3.0.1.  Measurements  of  Relative  Distance 


As  always,  any  position  has  to  be  related  to  a  coordinate 
system  and  a  reference  position.  Several  types  of  sensors 
(e.g.  GPS,  RF)  basically  measure  the  distance  relative  to 
that  reference  point.  One  possibility  is  distance  measure¬ 
ments  of  the  own  position  relative  to  points  of  known  po¬ 
sitions  pi,  i  =  1, . . .  ,  M,  which  yields  M  measurement 
equations  with 


Pi  ~P  i 


(i) 


,  i  =  1,...  ,M. 


(2c) 


This  is  also  applicable  when  the  position  of  another  object  is 
related  to  one’s  own  position  (e.g.  radar,  sonar,  ultrasound): 


h(b]  (P(t]  ,P?]) 


Pi 


(2) 


(2d) 


Some  sensors  do  not  measure  the  relative  distance  ex¬ 
plicitly,  but  rather  a  quantity  related  to  the  same.  One  exam¬ 
ple  is  sensors  that  measure  the  received  radio  signal  power 
transmitted  from  a  known  position  pt.  This  received  power 
typically  decays  as  ~  Ki/ra,  a  £  [2, 5],  where  I< i  and  a 
are  depending  on  the  radio  environment,  antenna  character¬ 
istics,  terrain  etc.  In  a  logarithmic  scale,  the  measurements 
are  given  by 


hc,i(Pt1))  =  K-alog10 


Pi -Pi 


(i) 


i  —  1, . . .  ,M. 

(2e) 


where  K  =  log10  K\.  Analogously,  we  can  consider  the 
situation  when  we  focus  on  the  power  or  intensity  transmit¬ 
ted  or  reflected  from  an  object  and  received  at  one’s  own 
position.  The  measurement  is  thus  modeled  by 


h{p(j>t'\pf))  =  K-a\  og 


10 


(!)  (2) 
Pt  -Pt 


(2f) 


3.0.2.  Measurements  of  Relative  Angle 

Similarly,  the  sensors  can  measure  the  relative  angle  be¬ 
tween  two  positions  (e.g.  radar,  IR,  sonar,  ultrasound).  Given 
points  of  known  positions  pi,  i  =  1, . . .  ,  M,  the  relative 
angle  measurements  can  be  described  by 

^!i(Pt1})  =  ang i  =  (2g) 

When  relating  the  angle  of  an  object  to  one’s  own  position, 
we  have 

h{f] {p{t\pf))  =  angle -  (2h) 


3.0.3.  Measurements  of  Relative  Velocity 

Some  sensors  (e.g.  radar)  typically  measure  the  Doppler 
shift  of  signal  frequencies  to  estimate  the  magnitude  of  the 
relative  velocity.  This  is  essentially  only  applicable  when 
relating  the  velocity  of  an  object  to  one’s  own  velocity.  The 
measurements  are  categorized  by 


(2)  (1) 
v;  '  —  v  i  ’ 


(2i) 


3.0.4.  Map  Related  Measurements 

An  aircraft  can  compute  the  ground  altitude  from  radar  mea¬ 
surements  of  height  over  ground  and  barometric  measure¬ 
ments  from  which  altitude  is  computed.  The  measured  ter¬ 
rain  height  together  with  relative  movement  from  the  INS 
build  up  a  height  profile.  Thus,  hh(p (1>)  denotes  the  height 
at  point  p^1)  according  to  the  Geographical  Information  Sys¬ 
tem  (GIS).  Much  effort  has  been  spent  on  modeling  the 
measurement  error  ej1'1  in  a  realistic  way.  It  has  turned  out 
that  a  Gaussian  mixture  with  two  modes  works  well.  One 
mode  has  zero  mean,  and  the  other  a  positive  mean  which 
corresponds  to  radar  echos  from  the  tree  tops.  The  ground 
type  in  GIS  can  be  used  to  switch  the  mean  and  variances 
in  the  Gaussian  mixture.  For  instance,  over  sea  there  is  only 
one  mode  with  a  small  variance. 

For  map  matching  in  the  car  positioning  case,  there  is  no 
real  measurement.  Instead,  hf’(p^)  denotes  the  distance 
to  the  nearest  road,  and  the  measurement 

should  therefore  be  equal  to  zero.  A  simple  and  relevant 
noise  model  is  white  and  zero  mean  Gaussian  noise. 


4.  APPLICATIONS 

The  problem  areas  are 

•  Positioning,  where  one’s  own  position  is  to  be  esti¬ 
mated.  This  is  a  filtering  problem  rather  than  a  static 
estimation  problem,  when  an  inertial  navigation  sys¬ 
tem  is  used  to  provide  measurements  of  movement. 

•  Navigation,  where  besides  the  position  also  velocity, 
attitude  and  heading,  acceleration  and  angular  rates 
are  included  in  the  filtering  problem. 

•  Target  tracking,  where  another  object’s  position  is  to 
be  estimated  based  on  measurements  of  relative  range 
and  angles  to  one’s  own  position. 

These  problems  are  related  in  that  they  can  be  described  by 
quite  similar  state  space  models.  Traditional  methods  are 
based  on  linearized  models  and  Gaussian  noise  approxima¬ 
tions  so  that  the  Kalman  filter  can  be  applied.  Research  is 


35 


Application 

State  vector 

Input 

Measurement  equations 

Car  positioning 

p?> 

v?> 

Road  map  hj(p\1}),  possibly  GPS  or  base  station 
distances  h^] (p^),  base  station  powers  h^j (pjJ)) 

Aircraft  positioning 

P?] 

<4X) 

Altitude  map  hj(p\ 1J),  GPS  or  other  reference 
beacons 

Navigation  in  aircraft 

p^\ 

<4X) 

Altitude  map  hj  (ptl)  ).  GPS  or  other  reference 
beacons  (p[ 1  ^ ) 

Tracking 

(2)  (2) 

Pt  ’Vt 

distance  h(2) (p^ ,p\2)),  bearing  h{f2)(p^\p\2)), 
doppler  42)  (p^1} ,  p\2)),  intensity  h(J] (p|1} , pj2)) 

Navigation  and  tracking 
in  aircraft 

»(1)  v(1)  6a{1)  v{2)  v{'2) 
Pt  ivt  ioat  >Pt  ’ vt 

Altitude  map  hj(p\^),  GPS  or  other  reference 
beacons 

distance  h{b2)  (p{t1]  ,p(2)),  bearing  h{2)  {p[1}  ,p[2)), 
doppler  h{2)(p(t1),p(2)),  intensity  h{2)  {p^  ,p(2)) 

Navigation  and  tracking 
in  cars 

nW  v{1}  v(2)  v{W 

Pt  ivt  >Pt  ’ vt 

Road  map  hj(pt),  possibly  GPS  or  base  station 
distances  /^(pj1*),  base  station  powers  h^J  ) 

distance  h{^  (p[1}  ,p\2)),  bearing  hf]  (pj1}  ,p\2)), 
doppler  /42)(P/(1\p|2)).  intensity  h(2\pi\p{2)) 

Table  2.  List  of  considered  applications  with  respective  state  vector  (cf.  Table  1),  input  signal  and  sensor  information. 


focused  on  how  different  state  coordinates  or  multiple  mod¬ 
els  can  be  used  to  limit  the  approximations.  In  contrast  to 
this,  the  particle  filter  approximates  the  optimal  solution  nu¬ 
merically  based  on  a  physical  model,  rather  than  applying 
an  optimal  filter  to  an  approximate  model.  The  applications 
we  have  studied  on  real  data  are  described  below. 

Car  positioning  by  map  matching.  A  digital  road  map 
is  used  to  constrain  the  possible  positions,  where  a  dead¬ 
reckoning  of  wheel  speeds  is  the  main  external  input  to  the 
algorithm.  By  matching  the  driven  path  to  a  road  map,  a 
vague  initial  position  (order  of  km’s)  can  be  improved  to  a 
meter  accuracy.  This  principle  can  be  used  as  a  supplement 
to,  or  even  replacement  to,  GPS  (global  positioning  system). 

Car  positioning  by  Radio  Frequency  ( RF)  measurements. 
The  digital  road  map  above  can  be  replaced  by,  or  supple¬ 
mented  by,  measurements  from  a  terrestrial  wireless  com¬ 
munications  system.  For  handover  (to  transfer  a  connection 
from  one  base  station  to  another)  operation,  the  mobile  sta¬ 
tions  monitor  the  received  signal  powers  from  a  multitude 
of  base  stations,  and  report  regularly  to  the  network.  These 
measurements  provide  a  power  map  which  can  be  used  in 
a  similar  manner  as  above.  Mobile  stations  in  a  near  fu¬ 
ture  will  moreover  provide  the  possibility  of  monitoring  the 
traveled  distance  of  the  radio  signals  from  a  number  of  base 
stations  [5],  Such  measurements  can  also  be  utilized  in  the 
same  manner  as  with  the  power  measurements. 

Aircraft  positioning  by  map  matching  or  terrain  navi¬ 
gation.  A  GIS  contains,  among  other  information,  terrain 
elevation.  The  aircraft  is  equipped  with  sensors  such  that 
the  terrain  elevation  can  be  measured.  By  map  matching. 


the  position  can  be  deducted  [2], 

Integrated  navigation.  The  aircraft's  Inertial  Navigation 
System  (INS)  uses  dead-reckoning  to  compute  navigation 
and  flight  data.  i.e.  position,  velocity,  attitude  and  heading. 
The  INS  is  regarded  as  the  main  sensor  for  navitation  and 
flight  data  due  to  being  autonomous  and  having  high  relia¬ 
bility.  However,  small  offsets  cause  drift  and  its  output  has 
to  be  stabilized.  Here,  terrain  navigation  is  used  today. 

Target  tracking.  A  classical  problem  in  signal  process¬ 
ing  literature,  where  radar  or  IR  measures  relative  angle, 
and  for  radar  also  relative  range  and  range  rate,  to  the  ob¬ 
ject  [1],  For  the  case  of  bearings  only  measuring  IR  sensor, 
either  the  state  dynamics  or  measurement  equation  is  very 
non-linear  depending  on  the  choice  of  state  coordinates,  so 
here  the  particle  filter  is  particularly  promising. 

Combined  navigation  and  tracking.  Because  the  target 
tracking  measurements  are  relative  to  one’s  own  platform, 
positioning  is  an  important  sub-problem.  Since  the  sensor 
introduces  a  cross-coupling  between  the  problems,  a  unified 
treatment  is  tempting. 

Car  collision  avoidance  is  very  similar  to  the  target  track¬ 
ing  problem,  here  we  are  interested  in  predicting  the  own 
car's  and  other  objects’  future  position.  Based  on  the  pre¬ 
diction,  collision  avoidance  actions  such  as  warning,  brak¬ 
ing  and  steering  are  undertaken  when  a  collision  is  likely 
to  happen.  In  order  to  have  enough  time  to  warn  the  driver 
the  prediction  horizon  needs  to  be  quite  long.  Therefore, 
utilizing  knowledge  about  road  geometry  and  infrastructure 
becomes  important.  One  way  to  improve  the  prediction  of 
possible  maneouvres,  is  to  use  information  in  a  digital  map. 
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Thus,  this  is  a  specific  project  including  all  aspects  of  the 
problems  fisted  above. 

Typical  state  vectors,  input  signals  and  available  (non¬ 
linear)  sensor  information  are  summarized  in  Table  2. 

5.  CONCLUSIONS  AND  DISCUSSION 

We  have  given  a  general  framework  for  positioning  and  nav¬ 
igation  applications  based  on  a  flexible  state  space  model 
and  a  particle  filter.  Five  applications  illustrate  its  use  in 
practice.  Evaluations  in  real-time,  off-line  on  real  data  and 
in  simulation  environments  show  a  clear  improvement  in 
performance  compared  to  existing  Kalman  filter  based  so¬ 
lutions,  where  the  new  challenge  is  to  find  non-linear  re¬ 
lations,  state  constraints  and  non-Gaussian  sensor  models 
that  provide  the  most  information  about  the  position.  Thus, 
modeling  is  the  most  essential  step  in  this  approach,  com¬ 
pared  to  the  various  implementations  of  the  Kalman  filter 
found  in  this  context  (linearization  issues,  choice  of  state 
coordinates,  filter  banks,  Gaussian  sum  filters,  etc.). 

General  conclusions  from  the  implementations  are  as 
follows:  A  choice  of  state  coordinates  making  the  state  equa¬ 
tion  linear  is  beneficial  for  computation  time  and  opens  up 
the  possibility  for  Rao-Blackwellization.  This  procedure 
enables  a  significant  decrease  in  the  particle  state  dimen¬ 
sion.  The  evaluation  of  the  likelihood  one  step  ahead  before 
resampling  (APF[10],  prior  editing)  is,  together  with  adding 
extra  state  noise  (jittering,  roughening),  crucial  for  avoid¬ 
ing  divergence,  and  implies  that  the  number  of  particles  can 
be  decreased  further.  Our  implementations  run  in  real-time 
(1-10  Hz),  even  in  Matlab  using  several  thousands  of  parti¬ 
cles.  Open  questions  for  further  research  and  development 
are  fisted  below: 

Divergence  tests.  It  is  essential  to  have  a  reliable  way 
to  detect  divergence  and  to  restart  the  filter  (for  the  latter, 
see  the  transient  below).  For  car  positioning,  the  number 
of  resamplings  in  the  prior  editing  step  turned  out  to  be  a 
very  good  indicator  of  divergence.  Another  idea,  used  in  the 
terrain  navigation  implementation  where  the  sampling  rate 
is  higher  than  necessary,  is  to  split  up  the  measurements  to 
a  filter  bank,  so  that  particle  filter  number i,  i  =  1, 2, . . . ,  n 
gets  every  rr’th  sample.  The  result  of  these  n  particle  filters 
are  approximately  independent  and  voting  can  be  used  to 
restart  each  filter.  This  has  turned  out  to  be  an  efficient  way 
to  remove  the  outliers  in  data. 

Transient  improvement.  The  time  it  takes  until  the  es¬ 
timate  accuracy  comes  down  to  the  stationary  value  (the 
Cramer-Rao  bound)  depends  on  the  number  of  particles. 
Given  limited  computational  time,  it  may  be  advantageous 
to  increase  the  number  of  particles  N  after  a  restart  and  dis¬ 
card  samples  in  such  a  way  that  N  ■  fs  is  constant. 

Since  the  particle  filter  has  shown  good  improvement 
over  linearization  approaches,  it  is  tempting  to  try  even  more 
accurate  non-linear  models.  In  particular,  the  flight  dynam¬ 
ics  of  one’s  own  vehicle  is  known  and  indeed  used  in  model- 


based  control,  but  is  very  rare  in  navigation  applications.  As 
a  possible  improvement,  the  particle  filter  may  take  full  ad¬ 
vantage  of  a  more  accurate  model,  where  parts  of  the  non¬ 
linear  dynamics  from  driver/pilot  inputs  are  incorporated. 
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ABSTRACT 

In  this  paper  we  address  the  problem  of  multiuser  CDMA  detec¬ 
tion  under  fading  conditions.  The  optimal  detection  problem  can 
be  reformulated  as  an  optimal  filtering  problem  for  jump  Markov 
linear  systems;  i.e.  linear  Gaussian  state  space  models  switching 
according  to  an  unobserved  finite  state  space  Markov  chain.  Sev¬ 
eral  approaches  based  on  particle  filtering  techniques  are  reviewed 
to  perform  optimal  filtering  in  this  framework.  A  brief  simulation 
study  is  carried  out. 

1.  INTRODUCTION 

Code  division  multiple  access  (CDMA)  systems  allow  a  signifi¬ 
cant  increase  of  the  capacity  of  cellular  networks.  It  is  likely  they 
will  be  used  not  only  in  the  3G  mobile  but  also  in  the  following 
generations.  This  is  why  CDMA  systems  have  recently  been  un¬ 
der  intensive  research.  A  significant  thrust  of  this  research  has 
focused  on  the  multiuser  CDMA  detection  problem  in  multipath 
fading  environments  [2,  7,  11].  Multipath  fading  results  in  a  sig¬ 
nificant  increase  of  both  the  intersymbol  interference  (ISI)  among 
the  data  symbols  of  the  same  user,  and  the  multiple-access  inter¬ 
ference  (MAI)  among  the  data  symbols  of  different  users.  These, 
added  to  a  possibly  non-Gaussian  (impulsive)  nature  of  the  am¬ 
bient  noise  in  some  physical  channels  such  as  urban  and  indoor 
radio  channels,  make  the  problem  of  symbol  detection  extremely 
difficult. 

Under  conditions  of  fading  channels,  the  CDMA  transmission 
model  can  be  expressed  in  a  state-space  representation.  Thus,  in 
principle,  general  recursive  expressions  for  the  posterior  distribu¬ 
tion  of  the  symbols  may  be  derived,  from  which  estimates  of  the 
symbols  can  be  obtained.  However,  the  problem  has  proved  to  be 
a  difficult  one.  Indeed,  the  exact  computation  of  these  estimates 
involves  a  prohibitive  computational  cost  exponential  in  the  grow¬ 
ing  number  of  observations,  and  thus  approximate  methods  must 
be  employed. 

In  this  paper,  we  concentrate  on  the  problem  of  multiuser 
CDMA  detection  under  conditions  of  Rayleigh  flat  (frequency- 
nonselective)  fading  channels.  Our  approach  is  based  on  particle 
filtering  techniques,  efficient  simulation-based  algorithms  recently 
appeared  in  the  literature  (see  [5]  for  a  state-of-the-art  in  this  field). 
The  key  idea  of  particle  filters  is  to  use  an  adaptive  stochastic  grid 
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approximation  of  the  conditional  probability  of  the  state  vector 
with  particles  (values  of  the  grid)  evolving  randomly  in  time  ac¬ 
cording  to  a  simulation-based  rule.  Depending  on  their  ability  to 
represent  the  different  zones  of  interest  of  the  state  space  which  is 
dictated  by  the  observation  process  and  the  dynamics  of  the  under¬ 
lying  system,  the  particles  can  either  give  birth  to  offspring  parti¬ 
cles  or  die.  The  method  uses  several  variance  reduction  techniques 
designed  to  make  use  of  the  structure  of  the  model. 

In  [10],  we  applied  particle  filtering  techniques  to  the  problem 
of  demodulation  in  fading  channels.  Here  we  develop  a  similar 
method  in  the  more  complex  framework  of  CDMA  systems.  We 
also  review  and  compare  our  approach  with  alternative  determin¬ 
istic  and  stochastic  algorithms  presented  previously  in  the  litera¬ 
ture.  Preliminary  results  indicate  that  the  choice  of  the  algorithm 
is  very  application  dependent;  a  simple  deterministic  method  can 
perform  better  than  particle  filtering  techniques  in  some  applica¬ 
tions  whereas  it  cannot  even  be  realistically  applied  in  other  cases. 

The  remainder  of  the  paper  is  organized  as  follows.  The  model 
specification  and  estimation  objectives  are  stated  in  Section  2.  It 
is  shown  that  performing  optimal  estimation  of  symbols  requires 
solving  an  optimal  filtering  problem.  Section  3  introduces  and  re¬ 
views  several  deterministic  and  stochastic  schemes  to  approximate 
the  optimal  filter.  In  Section  4  simulation  results  comparing  var¬ 
ious  approaches  are  presented,  and  some  conclusions  are  reached 
in  Section  5. 

2.  PROBLEM  STATEMENT  AND  ESTIMATION 
OBJECTIVES 

Let  us  consider  the  downlink1  of  a  synchronous  CDMA  system 
that  is  shared  by  L  simultaneous  users  (see  Fig.  1).  Let  us  denote 
for  any  generic  sequence  k*,  k,:j  =  rr;+i  , . . . ,  kj)1,  and  let 
rid  be  the  nth  information  symbol  from  the  /th  user  and  stra,B(r) 
be  the  corresponding  equivalent  lowpass  signal  waveform  given  by 

slJL(r)  =  v/^Mr,(,'V(,)(r),  (n  -  1)T  <  r  <  nT, 

where  s„(.)  performs  the  mapping  from  the  digital  sequence  to 
waveforms  and  corresponds  to  the  modulation  technique  employed, 

'Our  method  can  equivalently  be  applied  to  the  uplink  multiuser  prob¬ 
lem  with  asynchronous  CDMA. 
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Fig.  1.  Transmission  of  M-ary  modulated  signals  in  Rayleigh  fading  channels. 


Ei  is  the  signal  energy  per  symbol  (sn  is  normalized  to  have  unity 
power),  and  u(^(r)  is  the  signature  waveform  for  the  /th  user, 
u(1\t)  =  ah\(T  —  hTc).  Here,  a[‘?H  is  a  pseudo-noise 

(PN)  code  sequence  consisting  of  H  chips  (with  values  {±1})  per 
symbol,  t)(t  —  hTc)  is  a  pulse  of  duration  Tc,  and  Tc  is  the  chip 
interval,  Tc  =  T/H. 

The  waveform  goes  through  a  flat2  Rayleigh  fading  channel 
and  is  corrupted  by  additive  complex  noise  which  is  assumed  to  be 
Gaussian3.  Thus,  after  matched  filtering  and  sampling  at  the  rate 
Tc  1 ,  the  complex  output  of  the  channel  at  instant  t  =  (n  — \)H  + 
jh,  jh  =  1, . . . ,  H,  corresponding  to  the  transmission  of  the  nth 
symbols  can  be  expressed  as 

y\t=(n-i)H+jh  =  AJhSn  (rV0)gH-<xet,et  '  A/"c  (0, 1) , 

where  S„  =  diag(  (^\/EiSn^ ,  'JElSu1'^  ,  a2  being  the  noise 

variance,  A jh  =  ■  ■  ■  ■  j  ,  and  gf  represents  a  multi¬ 

plicative  discrete  time  disturbance  of  the  channels, 
gt  =  ■  ■  ■  ,  >  which  is  at  instant  t  modelled  as  an 

ARMA(q,  q)  process  (Butterworth  filter  of  order  q).  The  ARMA 
coefficients  a  (AR  part)  and  b  (MA  part)  are  chosen  so  that  the  cut¬ 
off  frequency  of  the  filter  matches  the  normalized  channel  Doppler 
frequency  f,jTc,  which  is  known.  Thus,  the  problem  can  be  for¬ 
mulated  in  a  linear  Gaussian  state  space  form  (conditional  upon 
the  symbols),  see  [9]  for  details  of  representation. 

The  symbols  r„  =  (Vi1  ,  r  V  1  j  ,  which  are  assumed 

i.i.d.,  and  the  channel  characteristics  gt  corresponding  to  the  trans¬ 
mission  of  the  nth  symbol  are  unknown  for  n  >  0.  Our  aim 
is  to  estimate  r„  given  the  currently  available  data  yi:„,  y„  = 
y(n-i)H+i:nH ■  This  can  be  done  using  the  MAP  (maximum  a 
posteriori )  criterion: 


However,  this  problem  does  not  admit  any  analytical  solution  as 
computing  p  ( rn  |  y i  :n )  involves  a  prohibitive  computational  cost 
exponential  in  the  (growing)  number  of  observations  and,  thus,  ap¬ 
proximate  methods  must  be  employed. 

3.  PARTICLE  FILTERING 

Given  yim,  all  Bayesian  inference  on  n:n  relies  on  the  posterior 
distribution  p(ri;n|y1:n),  which  we  propose  to  estimate  using 
particle  filtering  techniques.  The  idea  is  to  approximate 
P  (ri:„|  yi;n)  by  swarms  of  weighted  points  in  the  sample  space 

jW  |  ,  called  particles.  The  particles  evolve  randomly  in  time 
in  correlation  with  each  other,  and  either  give  birth  to  offspring 
particles  or  die  according  to  their  ability  to  represent  the  different 
zones  of  interest  of  the  state  space. 

A  number  of  different  algorithms  of  this  type  have  been  re¬ 
cently  proposed  in  the  literature  (see  [5]  for  the  survey),  some  of 
them  ([1,  3,  4,  6],  for  example)  are  specifically  designed  to  make 
use  of  the  structure  of  the  model  presented  in  Section  2.  Here,  we 
shall  consider  the  essential  features  of  these  approaches,  the  details 
of  the  algorithms  may  be  found  in  the  appropriate  references. 

Sequential  Importance  Sampling  and  Resampling  (SISR).  The 
method  is  based  on  the  following  remark.  Suppose  that  N  par¬ 
ticles  |rj‘V|  can  be  easily  simulated  according  to  an  arbi¬ 
trary  convenient  importance  distribution  7r(n;n|  y1:n)  (such  that 
p(ri:n|  yi;n)  >  0  implies  7r(ri;n |  yim)  >  0).  Then,  using 
the  importance  sampling  identity,  an  estimate  of  p(n;„|  yj;„)  is 
given  by  the  following  point  mass  approximation 


Pn  (ri:„|  yi:„) 


"i:^(rW  )(rl:n), 


?n  =  arg  maxp(rn|yi;n) . 

^  rn 

:Frequency-selective  channels  can  be  considered  in  the  same  frame¬ 
work. 

3The  case  of  non-Gaussian  noise  can  be  easily  treated  using  the  tech¬ 
niques  presented  in  [10]. 
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In  order  to  propagate  this  estimate  sequentially  in  time, 
Jr(n:„|yi:n)  has  to  admit  7r  (n:n_i  |  yi:n-i)  as  a  marginal  dis¬ 
tribution.  In  addition,  at  each  time  step  a  selection  step  is  included 
in  the  algorithm  in  order  to  discard  particles  with  low  normal¬ 
ized  importance  weights  and  multiply  those  with  high  ones.  The 
choice  of  the  importance  distribution  and  a  selection  scheme  is 
discussed  in  [4];  depending  on  those  being  employed,  the  compu¬ 
tational  complexity  of  the  algorithm  varies.  As  it  is  shown  there, 
a-  ( rn  |  r i  _  i ,  y i  :„ )  =  p  ( r„  |  r i  ;n _  i ,  y  i  ;„ )  is  an  importance  dis¬ 
tribution  that  minimizes  the  conditional  variance  of  w  (r);n)  and, 
therefore,  is  “ optimal ”  in  the  framework  considered  (see  [4]  for 
details).  However,  for  each  particle  it  requires  evaluation  of  the 
Ml  //-step  ahead  Kalman  filters  for  detection  of  the  nth  symbols 
since  in  this  case 

A IL 

wn  OC  p  ( y« |  rl'.J, =  pm,yi:„-i)  , 

m  =  1 

where  pm  corresponds  to  the  mth  (m  —  1 , . . . ,  M 1 )  possible  real¬ 
ization  of  r„  (see  [9]  for  details).  Thus,  sampling  from  the  optimal 
distribution  is  computationally  extensive  if  ML  is  large.  In  this 
case,  the  prior  distribution  can  be  used  alternatively  as  the  impor¬ 
tance  distribution,  i.e.  7t  (r„  |  yi:„ ,  r„-i )  =  p  (r„  |  r„_i ),  so  that, 
in  total,  for  each  particle  at  time  t  only  one  Kalman  filter  step  is 
calculated.  However,  this  method  can  be  inefficient  as  it  does  not 
use  the  information  carried  by  y„  to  explore  the  state  space.  As 
far  as  the  selection  scheme  is  concerned,  stratified  sampling  [8] 
employed  in  this  paper  can  be  implemented  in  O  ( N )  operations. 
The  details  of  the  algorithm  are  described  in  [3,  4,  9],  a  similar 
approach  presented  in  [1], 

Deterministic/Resample  Low  Weights  approaches  ( RLW ).  An 
alternative  approach  to  obtain  the  estimate  of  the  posterior  distri¬ 
bution  p  (n:„|  yi,„)  is  based  on  the  following  approximation: 

PNxMl  (r>‘l  yi:«)  = 

N  Ml 


where 

ttin'"'1  ocp(yrl|r^, _!,!•„  =pm,yim-i).  (2) 

Thus,  we  consider  all  possible  “extensions”  of  the  existing  state 
sequences  at  each  step  n,  and  each  particle  has  ML  offspring  re¬ 
sulting  in  a  set  of  N  x  ML  particles.  These  will  each  be  assigned 
the  weights,  dependent  on  the  weight  of  the  parent  at  step  «  —  1 
and  the  likelihood  term  (2)  that  can  be  computed  using  the  Kalman 
filter.  In  terms  of  calculations,  this  is  equivalent  to  the  use  of  the 
optimal  distribution  in  SISR.  However,  when  performing  inference 
on  the  symbol  rn,  it  is  of  course  better  to  use  p;Vx  ml  ( r><  I  yi:n ) 
than  the  standard  SISR  approximation;  indeed  one  does  not  discard 
unnecessarily  any  information  by  selecting  randomly  one  path  out 
of  the  Ml  available. 

In  order  to  avoid  the  exponentially  increasing  number  of  par¬ 
ticles,  a  selection  procedure  has  to  be  employed  at  each  time  step. 
The  simplest  way  to  perform  such  selection  is  just  to  choose  the 
N  most  likely  offspring  and  discard  the  others  (as,  for  example,  in 
[12]).  A  more  complicated  approach  involves  preserving  the  parti¬ 
cles  with  high  weights  and  resampling  the  ones  with  low  weights, 
thus  reducing  their  total  number  to  N.  In  this  particular  case,  a 


resampling  scheme  without  replacement  should  be  designed,  i.e. 
each  particle  should  appear  at  most  once  in  the  resulting  set,  as, 
indeed,  there  is  no  point  in  carrying  along  two  particles  evolving 
in  exactly  the  same  way.  An  algorithm  of  this  type  is  presented  in 
[6]  but  other  selection  schemes  can  be  designed. 

Whether  we  choose  to  preserve  the  most  likely  particles  or  em¬ 
ploy  the  selection  scheme  proposed  in  [6],  the  computational  load 
of  the  resulting  algorithms  at  each  time  step  t  is  that  of  N  x  ML 
Kalman  filters,  and  the  selection  step  in  both  cases  is  implemented 
in  0(N  x  hIL  log  N  x  ML )  operations.  Of  course,  if  Ml  is  large, 
which  is  the  case  in  many  applications  (see  Section  4,  for  exam¬ 
ple).  both  these  methods  are  too  computationally  extensive  to  be 
used. 

4.  SIMULATION  RESULTS 

In  order  to  demonstrate  the  bit-error-rate  (BER)  performance  of 
our  (SISR)  algorithm  it  was,  first,  applied  to  the  case  of  binary- 
phase-shift-keyed  (BPSK)  symbols  transmitted  over  fast  fading 
CDMA  channels  with  L  =  3.  H  =  10  and  f,iTr  =  0.05.  The 
results  for  different  average  signal  to  noise  ratio  (SNR)  compared 
to  those  obtained  in  [2]  are  given  in  Fig.  2,  where  also  the  ideal 
channel  state  information  (CSI)  case  is  presented.  As  one  can  no¬ 
tice,  even  for  just  N  —  50  particles,  our  algorithm  outperforms 
substantially  that  of  [2],  especially  when  the  signal-to-noise  ratio 
(SNR)  is  large. 


Fig.  2.  Bit  error  rate  (dotted  line  +  (ideal  CSI),  solid  line  (SISR), 
dotted  line  x  ([2])). 


Then,  computer  simulations  were  carried  out  in  order  to  com¬ 
pare  the  performance  of  the  algorithms  presented  in  Section  3. 
Some  results  for  CDMA  systems  with  the  same  parameters,  SNR= 
10  and  N  =  50  are  presented  in  Table  1.  They  are  interesting 
in  the  sense  that,  in  this  case,  the  deterministic  approach  preserv¬ 
ing  the  N  most  likely  particles  (MLP)  turned  out  to  be  the  most 
efficient  one!  In  order  to  achieve  the  same  BER  using  a  more 
complicated  selection  scheme  (RLW)  presented  in  [6],  N  =  1000 
particles  were  required,  and  N  =  5000  was  needed  with  SISR. 
These  results  must  indeed  be  interpreted  cautiously.  With  other 
simulation  parameters  we  found  that  the  results  between  the  dif¬ 
ferent  algorithms  were  much  less  pronounced.  These  issues  need 
to  be  investigated  further. 

It  should  also  be  emphasized  that  if  the  number  of  users  or 
processing  gain  in  CDMA  is  large,  a  more  complex  modulation 
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MLP 

RLW 

SISR 

BER 

2.51  x  10“2 

2.59  x  10“2 

2.70  x  10“2 

Table  1.  Bit  error  rate  for  N  =  50  and  SNR=10  dB. 


scheme  is  used  and/or  the  additive  noise  is  non-Gaussian  (mod¬ 
elled  as  a  mixture  of  Gaussians),  both  MLP  and  RLW  are  of  no 
use  due  to  their  computational  complexity,  whereas  our  approach 
combined  with  Markov  chain  Monte  Carlo  (MCMC)  methods  ([4]) 
and  employing  the  prior  as  an  importance  distribution  leads  to  very 
good  performance  (see  [9]  for  the  details  and  further  results). 

5.  DISCUSSION 

In  this  paper,  we  consider  the  application  of  some  particle  filter¬ 
ing  techniques  to  the  problem  of  multiuser  CDMA  detection  under 
fading  conditions  in  the  presence  of  possibly  non-Gaussian  addi¬ 
tive  noise.  The  results  presented  indicate  quite  small  performance 
degradation  compared  to  that  of  the  receiver  with  ideal  CSI.  More¬ 
over,  additional  simulations  show  that  the  algorithm  exhibits  good 
performance  in  the  case  of  non-Gaussian  additive  noise,  whereas 
other  standard  methods  are  not  actually  designed  to  treat  this  case 
(see  [9]).  Similar  methods  can  also  be  applied  to  asynchronous 
CDMA  systems  and  frequency-selective  fading  channels. 

As  was  mentioned  above,  the  problem  addressed  in  this  paper 
can  be  represented  as  a  jump  Markov  linear  system.  We  have  re¬ 
viewed  several  approaches  to  perform  (approximate)  optimal  fil¬ 
tering  in  this  framework.  A  simulation  study  has  been  carried 
out  in  order  to  compare  these  algorithms  for  the  CDMA  detection 
problem.  Such  a  comparison  has  not  been  made  before.  In  prin¬ 
ciple,  all  schemes  are  capable  of  providing  optimal  performance 
given  a  large  number  of  particles.  However,  whenever  it  is  appli¬ 
cable,  we  found  out  that  a  basic  deterministic  approach  preserv¬ 
ing  the  N  most  likely  particles  turned  out  to  be  the  most  efficient 
method!  This  deserves  further  study.  This  does  not  mean  that  par¬ 
ticle  filtering  methods  are  of  no  use  in  communication  systems. 
Indeed  in  most  cases,  the  deterministic  approach  as  well  as  the  one 
proposed  in  [6]  cannot  be  applied  as  they  are  too  computationally 
extensive.  In  this  case,  particle  filtering  based  on  sampling  impor¬ 
tance  resampling  is  relevant  but  requires  the  design  of  a  “clever” 
importance  distribution  and/or  the  use  of  MCMC  steps;  see  [4]  for 
details. 
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ABSTRACT 

Recently,  a  new  Markov  chain  based  algorithm  for  drawing  sam¬ 
ples  from  a  desired  distribution  has  been  proposed.  This  algorithm, 
also  known  as  perfect  sampling  algorithm,  can  determine  exactly 
when  a  Markov  chain  enters  the  equilibrium,  and  hence  can  output 
exact  samples.  In  this  paper,  we  introduce  a  perfect  sampling  algo¬ 
rithm  called  the  rejection  Gibbs  coupler  for  perfect  sampling  from 
bounded  multivariate  distributions.  We  demonstrate  an  application 
of  the  rejection  coupler  for  generation  of  samples  from  truncated 
multivariate  Gaussian  distributions. 

1.  INTRODUCTION 

In  the  past  decade,  research  in  Markov  chain  Monte  Carlo  (MCMC) 
sampling  has  drawn  much  attention  in  the  statistical  and  signal 
processing  communities.  In  particular,  the  use  of  MCMC  sam¬ 
pling  has  revived  the  interest  in  using  the  Bayesian  methodology 
for  solving  various  practical  problems. 

Diagnosis  of  convergence  of  Markov  chains,  however,  remains 
a  challenging  problem.  As  a  result,  samples  obtained  by  MCMC 
methods  can  only  be  considered  approximately  rather  than  exactly 
distributed  according  to  a  desired  distribution.  In  1996.  Propp  and 
Wilson  [1]  proposed  a  solution  to  the  aforementioned  problem  of 
MCMC  such  that  the  convergence  time  of  a  Markov  chain  can  be 
exactly  determined.  Thus  the  samples  produced  thereafter  are  ex¬ 
act  samples  from  the  desired  distribution.  This  algorithm  is  named 
coupling  from  the  past  (CFTP).  Since  then,  research  on  further 
development  of  CFTP  algorithm  has  quickly  picked  up.  The  orig¬ 
inal  CFTP  was  designed  on  discrete  variable  spaces.  A  successful 
extension  of  CFTP  to  continuous  variable  spaces  was  introduced 
by  Murdoch  and  Green  [2]  where  several  algorithms  such  as  the 
multigamma  coupler,  the  rejection  coupler,  and  the  Metropolis 
coupler  were  proposed.  In  addition,  the  possibility  of  constructing 
a  Gibbs-sampler-like  perfect  sampling  algorithm  was  also  demon¬ 
strated. 

In  [3],  we  have  proposed  a  novel  perfect  sampling  algorithms 
called  the  Gibbs  coupler.  The  proposed  algorithm  on  high  di¬ 
mensional  binary  spaces  overcomes  the  obstacle  of  the  original 
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CFTP  in  that  it  can  be  efficiently  implemented  regardless  of  the 
existence  of  (anti-)monotonic  Markov  chains.  Applications  of  the 
Gibbs  coupler  were  shown  for  problems  on  variable  selection  [3] 
and  multiuser  detection  of  CDMA  systems  [4], 

In  this  paper,  we  introduce  a  new  version  of  the  Gibbs  cou¬ 
pler  termed  the  rejection  Gibbs  coupler.  The  rejection  Gibbs  cou¬ 
pler  combines  the  idea  of  the  rejection  coupler  with  the  framework 
of  the  general  Gibbs  coupler  and  aims  at  sampling  from  bounded 
multivariate  distributions.  In  this  paper,  first  we  outline  the  rejec¬ 
tion  Gibbs  coupler,  and  then  we  discuss  the  partitioning  technique 
which  is  important  for  practical  implementation  of  the  algorithm. 
Finally,  we  show  how  the  Gibbs  coupler  can  be  applied  to  draw 
samples  from  truncated  multivariate  Gaussian  distributions.  Sim¬ 
ulation  results  are  also  provided  to  show  the  performance  of  the 
rejection  Gibbs  coupler. 

2.  COUPLING  FROM  THE  PAST 

CFTP.  similarly  to  the  MCMC  sampling  methods,  generates  sam¬ 
ples  from  a  desired  distribution  by  using  Markov  chains.  However, 
CFTP  constructs  not  a  single  but  multiple  Markov  chains  and  uti¬ 
lizes  the  concept  of  coupling.  In  a  coupling  process,  at  any  tran¬ 
sition.  the  same  update  function  and  random  number  are  assigned 
to  all  the  Markov  chains.  In  CFTP.  the  coupling  process  is  im¬ 
plemented  from  the  past  to  time  0.  The  CFTP  algorithm  can  be 
described  as  an  iterative  scheme  by  the  following  pseudocode: 

CFTP(r) 

t  ■ - T,  Bt  *—  S 

while  t  <  0 
t  *-  t  +  1 
Bt  <—  4>(B,-\,Ut) 
if  \Bt  \  =  1  then 
return(,Bo) 
else 

CFTP(2T) 

In  the  above  pseudocode.  S  denotes  a  desired  discrete  state  space 
with  size  M  =  |S|,  and  and  are  the  random  seed 

and  update  functions,  respectively.  At  the  start  of  each  iteration, 
CFTP  initiates  M  Markov  chains  at  every  possible  state  of  the  state 
space  5  from  some  time  —T  in  the  past,  couples  them  together, 
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and  runs  towards  time  0.  Then  at  time  0,  the  coalescence  of  the 
chains  is  checked.  It  is  noted  that  all  the  Markov  chains  should 
have  the  desired  distribution  as  their  stationary  distribution.  Now, 
if  all  the  chains  have  coalesced  to  the  same  state  at  time  0,  the  coa¬ 
lesced  state  is  then  a  perfect  sample  from  the  desired  distribution. 
This  is  because  if  we  started  the  algorithm  from  the  infinite  past 
but  kept  the  existing  random  seeds  of  the  transition  from  —  T  to 
0,  the  Markov  chains  would  have  coalesced  into  the  same  state  at 
time  0.  Apparently,  since  the  chains  would  have  been  propagated 
from  the  infinite  past,  the  coalesced  state  at  time  0  is  a  steady  state 
which  follows  the  desired  distribution  exactly. 

Notice  that  CFTP  is  proposed  primarily  for  problems  with  fi¬ 
nite  discrete  variable  spaces.  A  direct  extension  of  CFTP  to  con¬ 
tinuous  variable  spaces  is  prohibitive  since  the  size  of  a  continuous 
variable  space  is  infinite,  and  thus  it  will  take  CFTP  infinite  time 
to  reach  coalescence.  To  allow  for  perfect  sampling  from  continu¬ 
ous  variable  spaces,  special  care  must  be  taken  to  map  infinite-size 
continuous  variable  spaces  into  finite  discrete  variable  spaces  [2], 
In  the  next  section,  we  introduce  an  algorithm  that  achieves  perfect 
sampling  from  bounded  multivariate  distributions. 

3.  THE  REJECTION  GIBBS  COUPLER 

Suppose  that  we  want  to  draw  samples  from  an  N  dimensional 
multivariate  distribution  p(x)  defined  on  a  variable  space  S.  To 
apply  the  rejection  Gibbs  coupler  to  the  problem,  the  full  con¬ 
ditional  distributions  p(ati|x_<)  Vi  are  required  to  be  specified, 
where  x_,  represents  the  vector  of  the  N  —  1  variables  in  x  except 
for  the  i-th  variable.  Moreover,  we  assume  that  an  upper  bound 
and  a  lower  bound  functions  can  be  determined  at  every  instant  of 
time  t  such  that 

h?i\xi)  =  max  g(xi\x.-i)  (1) 

x_;€  SLi 

and 

rj^fxi)  =  min  p(a:i|x_i)  (2) 

x-.eSli 

where  S_;  C  S-i,  and  g(xi\x.-i)  is  a  function  proportional  to 
p{xi\x.-i).  Notice  that  detailed  expression  of  p(x,[x_j)  can  vary 
by  including  or  removing  terms  with  respect  to  x_j  (since  these 
terms  are  considered  as  the  proportional  constant).  A  different  ex¬ 
pression  of  g{xi\x—i )  will  eventually  affect  the  complexity  of  the 
algorithm.  Generally,  there  are  two  guiding  principles  for  choos¬ 
ing  the  function  p(a;,|x_i).  First,  </(xj|x_i)  should  be  in  a  form 
easy  for  the  determination  of  hi(-)  and  ?•*(•)•  Second,  the  cor¬ 
responding  distribution  hi  ( x )  / v  should  be  easy  to  sample  from, 
where  v  =  J  h(x)dx  is  the  normalizing  constant.  Now,  once  the 
bounded  functions  are  determined,  the  algorithm  of  the  rejection 
Gibbs  coupler  can  be  proceeded  according  to  the  outline  displayed 
in  Chart  I. 

In  the  algorithm,  h!*’’  and  rf1  are  also  determined  according 
to  (1)  and  (2).  Notice  that  the  general  framework  of  the  algorithm 
still  follows  that  of  CFTP.  However  the  detailed  coupling  scheme  is 
based  on  the  rejection  coupler.  Typically,  the  ratio  pf)  =  rf)  /hf'* 
is  a  key  factor  in  defining  the  speed  of  coalescence  of  the  algo¬ 
rithm.  This  is  because  on  average,  the  algorithm  would  generate 
1  / p'P  samples  at  time  t  for  the  i-th  component.  Therefore,  the 


larger  the  pf\  the  less  the  number  of  samples  the  algorithm  pro¬ 
duces  and  hence  the  faster  the  coalescence. 

Chart  I. 


Rejection  Gibbs  coupler(T): 

t  < - T,  S{t)  <-  S 

while  t  <  —T/2 
t  <- 1  +  1 
for  i  =  1, 2,  •  ■  ■ ,  N 

determine  and  r^(xi)  w.r.t.  «S^- 

j  0 

repeat 

j  *—  j  +  1 

draw  from  (7(0, 1) 
draw  Xij  from 

if  u[f  <  r^(Xij)/hf\Xij)  then 

J*-j 

exit  repeat 

<-{Xn,Xi2,---,Xu} 

while  t  <  0 
t  <-t  +  l 
fori  =  1,2,  •  ■  N 

determine  hf\-)  and  r^(-)  w.r.t.  «S^ 

j  <—  0 

repeat 

j*~j  +  1 

for  Xj  e  S\t] 
fc  «-  0 

if  u\f  <  hf^X^/hfiX,)  then 
k  < —  k  +  1 
_ x  ■ 

if  ulf  <^‘it\Xj)/h\t\Xj)  then 
exit  repeat 

K  <—  k,  S,(t)  +-{X1,Xa,---,XK} 

if  size  of  S7(t)  is  equal  to  1 ,  for  i  =  1, 2,  •  •  • ,  N  then 
return(5(0)) 
else 

Rejection  Gibbs  coupler(2T) 


4.  THE  PARTITIONING  TECHNIQUE 

In  practice  the  ratio  pf  is  often  very  small  which,  as  a  result,  leads 
to  vary  slow  convergence  of  the  algorithm.  To  circumvent  this 
difficulty,  one  can  divide  the  variable  space  into  a  collection 
of  disjoint  cells,  or  partitions  [2],  and  specify  the  upper  bound  and 
lower  bound  functions  for  each  partition.  The  partition  is  done 
in  a  way  that  the  ratio  for  the  /-th  partition  p\f  is  large  enough  to 
guarantee  a  reasonable  mean  number  of  samples.  For  instance,  one 
can  impose  M  partitions  on  Slflr)  and  for  each  partition  set  = 

(p^)xf .  Then  the  average  number  of  the  produced  samples  at 
time  t  will  be  M / (p\ty)^J .  As  a  simple  illustration,  if  p  =  10-5 
and  M  =  5,  the  value  of  pf^  =  0.1,  or,  after  partition,  50  samples 
are  produced  on  average.  If  we  compare  this  average  with  the 
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average  of  l/p^  =  105  samples  before  partitioning,  the  number 
of  produced  samples  is  reduced  by  2000  times.  Equivalently,  we 
can  say  that  the  partitioned  algorithm  is  2000  times  faster. 

However,  if  M  is  determined  at  the  first  step  of  the  algorithm 
or  at  t  =  — T,  and  then  fixed  afterwards,  it  is  possible  that  at 
a  certain  time  instant  f,  the  mean  number  of  samples  1  / p\i]  is 
already  less  than  M.  Since  the  partitioning  algorithm  with  fixed 
M  would  produce  at  least  M  samples,  it  introduces  more  samples 
at  time  t  than  the  nonpartitioning  algorithm.  Consequently,  in  this 
case,  the  partitioning  algorithm  would  be  slower  to  coalesce.  There 
are  two  remedies  to  this  problem.  With  the  first  one,  one  can  fix 
the  detailed  range  for  each  of  the  M  partitions  at  the  first  step  of 
the  algorithm,  and  the  range  of  each  partition  will  remain  fixed 
later  on  in  the  algorithm.  Then  at  any  time  f,  the  average  number 
of  samples  produced  for  each  partition  would  not  change.  With 
the  second  remedy,  one  can  fix  the  value  of  p  for  all  t.  In  that 
case,  from  time  to  time  the  number  of  partitions  M  will  change 
on  different  5^-.  Under  this  scheme,  as  long  as  l/p\j  >  1,  the 
proposed  problematic  scenario  would  no  longer  occur.  This  option 
leads  to  an  adaptively  partitioning  algorithm. 


5.  PERFECT  SAMPLING  FROM  TRUNCATED 
MULTIVARIATE  GAUSSIAN  DISTRIBUTIONS  BY  THE 
REJECTION  GIBBS  COUPLER 

In  this  section,  we  demonstrate  the  use  of  the  rejection  Gibbs  cou¬ 
pler  for  drawing  perfect  samples  from  truncated  multivariate  Gaus¬ 
sian  distributions.  First,  let  x  =  [xi  x2  -  •  •  x.v  ]T  6  represent 
a  vector  of  N  random  variables  which  is  distributed  according  to 
the  truncated  multivariate  Gaussian  T N(ft,  £ ,  <S  h  )  where  /.i  and 
£  are  the  N  x  1  mean  vector  and  the  N  x  N  covariance  ma¬ 
trix  of  the  corresponding  non-truncated  Gaussian  distribution,  and 
SN  =  u£Li[aj,  &;].  Next,  we  rearrange  x  by  x  =  [z,  xI,]T,  and 
partition  gt  and  £  accordingly  by 


and 


( t )  min 


argminje5(,)Z  if  ft  >  0 
argmaxj£9(i)i  if  f3j  <  0 


where  3j  represents  the  j-th  element  of  £_J.  and  denotes 

the  support  of  Xj  at  time  t.  Next,  let  x^)  miw‘  and  x^j 111111  repre¬ 
sent  two  (N  —  1)  x  1  vectors  which  consist  of  all  except  the  i-th 
components  in  x(,)  max  and  x(i)  m,n,  respectively.  Then  the  upper 
and  lower  bounds  on  g(xj  |x*)  are  found  to  be 

hfHxi)  = 

-43(2£J1£:11  (x(j  >  n,ax  -ft_ ,  )xi  +,, ,  xj  -r?) 


and 


^(2£l1£:J(x<_t)n’ln-M_j)a-,+/.ixi-T?) 

-l7(2£l1£:,1(x(_‘)n,I'x-/i_iH-i+/,i.ri-.T?) 


if  Xi  > 
if  x-i  <  0 


4) 


if  x,  >  0 
if  Xi  <0  . 


(5) 


Furthermore,  the  distribution  corresponding  to  the  upper  bound 
function  can  be  shown  as  a  mixture  of  two  truncated  Gaussian  dis¬ 
tributions  and  has  the  form 

=  wnTN(jjn,aJ,0,bi)+Wi2TN(fii2,aj,ai,0)  (6) 

where  w,i  and  w,2  are  the  weights  assigned  to  the  two  mixands, 
gn  =  S7i£:}(x(^nu“-/i-i)+/ii,and/i„  =  ^^(x^11"11- 
M-i)  +  /'>•  The  weights  wn  and  iu,2  are  uniquely  defined  as 


and 


u>n  =  ci/(ci+c2) 
Wi2  =  Co/(d  +  c2) 


(7) 


(8) 


where 

ci  =  exp{pn/(2of)}  (Q((0  - 

-  gn)/cri)  -  Q{{bi  -  fin)/cti)) 

and 

c2  =  exp{p?2/(2<5f )}  (<?((«;  - 

-  gn)/<7i)  -  Q{( 0  -  M/w)) 

Then,  it  can  be  shown  that  the  full  conditional  distributions  p(x;  |x_,  ) 
are  also  truncated  Gaussians  that  can  be  expressed  as 


p(zj|x_,-)  =  TN(fii,<r?,(ai,bj]) 


exp{^r(2  S|,S_|(x-i-/i.i)ii  +  g,x,  -  xj)} 

ACT f 


g(xi  |x_,) 


(3) 


Here,  Q(y)  is  the  Q-function  which  represents  the  probability  that 
a  Af(0, 1)  random  variable  exceeds  y.  Once  we  determine  the 
weights  of  the  two  mixands,  sampling  from  fh(x;)  is  easy  to  ac¬ 
complish  and  can  proceed  as  follows: 

draw  ufrom  U (0, 1); 
ifu<w  i 

draw  Xj  from  T  N(fn ,  of,  0,  bj)\ 


where /i;  =  pi+EjiElJjx-,—  /»_;)  and  erf  =  of—  Sj:’E_jSn. 
To  apply  the  rejection  Gibbs  coupler,  we  need  to  determine  the 
bounds  on  p(z;  |x_,  )  and  specify  the  distribution  that  corresponds 
to  the  upper  bound.  First,  define  two  IV  x  1  vectors  x(,)  max  and 
x(i)  mm,  wjj0se  components  are 


otherwise 

draw  Xifrom  TAr(p,2,  of,  a,-,  0); 

Note  that  a  sample  x  from  the  univariate  truncated  Gaussian 
distribution  TN(p,  cr2,  a,  b)  is  obtained  by  the  inverse  transforma¬ 
tion  which  computes 


(f)  max  _ 
X3  ~ 


argmax^g  s(t)Z  if  (3j  >  0 
argmin x£S(t)X  if  (3j  <  0 


x  =  oQ  1  (u(Q((a-g) /ai)-Q{(b- g)  /oi))+Q{(a- g)  /<7i))+g 
where  u  is  a  sample  from  U (0, 1). 
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Figure  1:  Scattergram  of  1000  perfect  samples  from 
TN( 0,  [1  0.8;  0.8  1],  U?=1(— 1, 1])  by  the  Gibbs  coupler. 
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Figure  2:  Scattergram  of  1000  perfect  samples  from 
TN( 0,  [1  0.2;  0.2  1],  J?=1(— 1,  1])  by  the  Gibbs  coupler. 

6.  SIMULATION  RESULTS 

To  demonstrate  the  performance  of  the  proposed  rejection  Gibbs 
coupler,  several  experiments  were  performed.  In  the  first  experi¬ 
ment,  1000  perfect  samples  from  the  bivariate  truncated  Gaussian 
TN(0,  S,  U?=1(— 1, 1])  were  collected,  where  £  =  [1  0.8;  0.8  1], 
The  scattergram  of  the  samples  is  displayed  in  Figure  1. 

In  the  second  experiment,  the  covariance  matrix  was  £  = 
[1  0.2;  0.2  1],  Again,  1000  perfect  samples  were  collected.  The 
samples  scattergram  is  shown  in  Figure  2. 

In  the  third  experiment,  we  examined  the  correlation  between 
samples  obtained  through  the  rejection  Gibbs  coupler.  By  using 
the  samples  obtained  in  the  first  experiment,  we  calculated  the  es¬ 
timate  of  the  autocorrelation  coefficients  for  first  variable  and  the 
crosscorrelation  coefficient  between  two  variables.  In  addition,  we 
also  applied  the  Gibbs  sampler  [6]  and  generated  samples  from 
the  bivariate  truncated  Gaussian  with  the  same  setting  as  that  in 
the  first  experiment.  Similarly  we  calculated  the  estimate  of  their 
autocorrelation  and  crosscorrelation  coefficients.  The  results  are 
demonstrated  in  Figure  3  and  4.  The  figures  clearly  show  that 
the  Gibbs  sampler  results  in  much  larger  correlations  for  adjacent 
samples  than  the  Gibbs  coupler.  This  indicates  that  any  inferences 
carried  out  by  the  perfect  samples  generated  through  the  rejection 
Gibbs  coupler  will  have  a  smaller  variance  than  that  through  the 
Gibbs  sampler. 

7.  CONCLUSION 

We  proposed  an  algorithm  called  the  rejection  Gibbs  coupler  for 
perfect  sampling  from  bounded  multivariate  distributions.  As  an 


Figure  3:  Plot  of  the  sample  autocorrelation  coefficients  of  the  first 
variable. 


Figure  4:  Plot  of  the  sample  crosscorrelation  coefficients  between 
two  variable. 

application,  we  showed  the  implementation  of  the  proposed  algo¬ 
rithm  on  truncated  multivariate  Gaussian  distributions.  The  ad¬ 
vantage  of  the  rejection  Gibbs  coupler  over  the  Gibbs  sampler  is 
shown  by  the  simulation  results. 
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ABSTRACT 

Important  in  the  application  of  Markov  chain  Monte 
Carlo  (MCMC)  methods  is  the  determination  that  a 
search  run  has  converged.  Given  that  such  searches 
typically  take  place  in  high-dimensional  spaces,  there 
are  many  pitfalls  and  difficulties  in  making  such  as¬ 
sessments.  In  the  present  paper,  we  discuss  the  use 
of  phase  randomisation  as  tool  in  the  MCMC  context, 
provide  some  details  of  its  distributional  properties  for 
time  series  which  enable  its  use  as  a  convergence  diag¬ 
nostic,  and  contrast  its  performance  with  a  selection  of 
other  widely  used  diagnostics.  Some  brief  comments  on 
analytical  results,  obtained  via  Edgeworth  expansion, 
are  also  made. 

1.  INTRODUCTION 

MCMC  methods  support  the  application  of  Bayesian 
statistical  methods  through  permitting  complex  distri¬ 
butions  to  be  evaluated  (specifically,  by  handling  theo¬ 
retically  intractable  integrals  of  high-dimensional  prob¬ 
ability  density  functions).  Given  the  numerical  and  ge¬ 
ometrical  complexity  of  MCMC  methods,  assessment 
of  convergence  is  a  non-trivial  task.  Diagnostics  for 
convergence  are  required  in  practical  settings,  and  thus 
need  to  be  accessible,  accurate  and  fast. 

In  the  theory  of  time  series  resampling,  the  method 
of  phase  randomisation  has  been  used  to  generate  so- 
called  surrogate  time  series  with  the  same  first-  and 
second-order  properties  as  the  original:  see  Theiler  et 
al.  (1992)  and  Timmer  (1998),  as  well  as  Davison  and 
Hinkley  (1997)  who  use  the  term  phase  scrambling, 
and  Braun  and  Kulperger  (1997)  who  use  the  term 
Fourier  bootstrap.  In  essence,  one  takes  the  discrete 
Fourier  transform  of  a  time  series,  replaces  the  phase 
with  a  new  phase  randomly  chosen  from  the  interval 
(0,27r),  and  back-transforms  to  obtain  a  new  time  se- 
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ries.  Second-order  properties  are  maintained  by  virtue 
of  retaining  the  original  amplitudes  at  their  original  lo¬ 
cations  in  the  original  spectral  estimate.  If  the  original 
time  series  has  an  asymmetric  marginal  distribution 
then  appropriate  adjustments  can  be  made  in  accor¬ 
dance  with  the  so-called  rescaling  methods  of  Davison 
and  Hinkley  (1997),  otherwise  the  standard  algorithm 
suffices. 

The  algorithms  as  are  follows.  Denote  the  origi¬ 
nal  series  (of  length  n)  as  the  array  x[t]  with  ranks  rt 
among  the  original  unordered  series. 

Standard  Algorithm 

1.  Compute  the  Discrete  Fourier  Transform  z[t }  = 
DFT{x[t}). 

2.  Randomise  the  phases;  that  is,  randomly  choose 
4>[t]  from  the  uniform  distribution  of  (0,27r),  and 
put  z'[t]  —  z\t]  exp  [i(f>[t]). 

3.  Symmetrise  the  phases  such  that  Re{z"[t])  = 
Re  (z'[t]  +  z'[n  +  1  -  *])  /2  and  also  Im  (z"[/j)  = 
Im  (z'[t)  -  z'[n  +  1  -  t\)  /2. 

4.  Invert,  putting  x'[f]  =  DFT~X  {z"[t]). 

5.  The  resulting  series  x'[t]  is  the  surrogate. 

Rescaling  Algorithm 

1.  Let  y,  =  4>_1  {rt/{n  +  1)},  where  4>  is  the  empiri¬ 
cal  distribution  function  of  the  original  unordered 
series. 

2.  Apply  the  Standard  Algorithm  to  y\ , . . . ,  yn ,  giv¬ 
ing  Ij*, . . . ,  Y*  (see  above). 

3.  Set  the  surrogate  series  to  be  X*  =  X(r;j,  where 
rf  is  the  rank  of  Y*  among  l  j* , . .  - ,  Y* . 
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One  can  use  surrogate  time  series  to  test  a  null  hy¬ 
pothesis  that  the  original  series  arises  from  a  linear, 
stochastic,  Gaussian  stationary  process.  (Note  that 
the  rejection  of  this  hypothesis  covers  a  wide  range  of 
alternatives.)  If  a  statistic  from  the  original  series  is 
denoted  as  Vo  and  the  corresponding  statistic  from  the 
j’th  surrogate  is  denoted  as  Vj ,  with  E  (V/)  —  /iv  and 
var(Vj)  =  <jy,  then  one  may  use  as  the  test  statistic 
[Vo  —  /H'|  lay,  and  calibrate  against  a  Normal  distri¬ 
bution,  if  appropriate.  Timmer  (1998)  has  illustrated 
this  using  the  correlation  dimension  as  the  underly¬ 
ing  statistic  in  the  context  of  cyclostationary  processes, 
demonstrating  power  to  reject  the  null  hypothesis  in 
the  presence  of  non-stationarity. 

2.  PHASE  RANDOMISATION  AND 
STATION  ARITY 

Second  order  properties,  and  some  marginal  shape  prop¬ 
erties,  are  known  to  be  preserved  under  phase  ran¬ 
domisation,  the  latter  if  the  scaling  method  is  used. 
We  examine  here  the  effect  of  phase  randomisation  on 
higher  order  moments  and  cumulants  of  a  time  series,  in 
particular,  to  determine  if  conditions  on  linearity  and 
stationarity  are  related  to  preservation  of  higher  order 
properties  under  phase  randomisation.  In  particular, 
for  a  time  series  {Art}  with  marginal  mean  //,  we  treat 
higher  central  moments,  of  the  form  E  {(ATt)}r;  higher 

order  cumulants,  of  the  form  E  { 11^=1  (Xt+k+j  —  ju)  j; 
and  higher  order  cross  cumulants,  of  the  lagged  prod¬ 
uct  form  E{( Xt  -  p)r  (Xt+k  -  p)7}.  In  each  of  these 
forms,  r  =  3,4,...,  k  =  1,2,...,  and  standard  esti¬ 
mates  were  used  in  simulations. 

Numerical  experiments  were  based  on  some  classical 
linear  and  non-linear  time  series  models,  including  lin¬ 
ear  autoregression  (AR),  random  walk  (RW),  bilinear 
stationary  (BS),  bilinear  non-stationary  (BN),  GARCH 
stationary  (GS),  GARCH  non-stationary  (GN),  thresh¬ 
old  autoregression  stationary  (TS)  and  threshold  au¬ 
toregression  non-stationary  (TN).  See  Tong  (1990)  for  a 
detailed  discussion  on  the  form  and  properties  of  these 
models. 

Timeplots  obtained  from  the  numerical  experiments 
showed  broad  agreement  with  the  original  data  sets, 
and  can  be  qualitatively  compared  as  in  the  following 
table  (using  the  rescaling  method). 


Model 

AR  RW  BS 

BN 

Note 

same  more  symm.  same 

same 

Model 

GS 

GN 

TS 

TN 

Note 

larger  vals. 

same 

same 

same 

When  comparing  the  stationary  with  non-stationary 


models,  it  was  sometimes  possible  to  distinguish  be¬ 
tween  them  on  the  basis  of  higher  order  moments:  the 
standard  method  produced  zero  values  for  odd  mo¬ 
ments;  however,  the  rescaling  method  produced  small 
values  for  the  third  moment  for  stationary  series  yet  the 
same  value  as  in  the  original  series  for  non-stationary 
models.  Thus,  third  order  moments  appear  to  have  a 
reasonably  good  discriminatory  ability  for  stationarity, 
and  hence  for  convergence  of  MCMC  procedures. 

The  behaviour  of  the  higher  order  cumulants  of  the 
surrogates  can  be  summarised  according  to  the  follow¬ 
ing  table.  The  non-stationary  models  are  shown  in  the 
last  three  rows  of  the  table. 


Model 

Original 

Standard 

Rescaling 

AR 

small 

odd  near  zero 

all  zero 

BS 

near  zero 

odd  near  zero 

near  zero 

GS 

near  zero 

near  zero 

near  zero 

TS 

odd  nr  zero 

odd  near  zero 

small 

RW 

3rd,  4th  small 

odd  smaller 

small 

BN 

large 

odd  near  zero 

smaller 

GN 

large 

smaller 

smaller 

TN 

large 

smaller 

large 

In  addition,  the  distribution  of  higher  order  cumu¬ 
lants  can  be  revealing  in  questions  of  stationarity,  as 
the  following  table  indicates.  Modes  refer  to  the  num¬ 
ber  of  modes  of  the  empirical  density  function  of  the 
cumulants  of  surrogate  series.  (The  standard  method 
showed  the  same  results  as  the  rescaling  method  for  all 
models.) 


Model 

Rescaling 

Mode 

AR 

odd,  even  unimodal  symm. 

near  zero 

BS 

unimodal 

near  zero 

GS 

unimodal,  tails 

near  zero 

TS 

unimodal,  tails 

near  zero 

RW 

multimodal 

non-zero 

BN 

unimodal,  tail 

non-zero 

GN 

multimodal,  tails 

non-zero 

TN 

multimodal 

non-zero 

To  summarise  the  results  of  these  tables,  we  can 
comment  as  follows.  In  the  case  of  the  standard  al¬ 
gorithm  (i)  higher  order  moments  and  cumulants  are 
preserved  for  linear,  Gaussian,  stationary  processes;  (ii) 
higher  moments  are  preserved  for  non-linear  stationary 
processes,  but  not  so  for  some  higher  order  cumulants; 
(iii)  second  and  cross-cumulants  are  not  preserved  for 
moderate  and  large  lags  if  the  process  is  linear  non- 
stationary;  and  (iv)  higher  cumulants  are  not  preserved 
for  non-linear  non-stationary  processes.  In  the  case  of 
the  rescaling  algorithm  (i)  the  method  is  inappropri¬ 
ate  for  linear,  Gaussian,  stationary  processes  as  second 
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order  cumulants  are  not,  preserved;  (ii)  higher  order  cu- 
mulants  are  not  preserved  for  non-linear  stationary  pro¬ 
cesses;  (iii)  higher  order  cumulants  and  cross-cumulants 
are  not  preserved  for  linear  non-stationary  processes; 

(iv)  higher  cumulants  are  substantially  different  from 
the  originals  for  non-linear  non-stationary  models;  and 

(v)  smoothing  densities  of  higher  order  cumulants  are 
multimodal,  or  at  least  unimodal  with  heavy  tails,  for 
non-stationary  processes,  while  remaining  unimodal  for 
stationary  processes. 

It  is  on  the  above  basis  that  convergence  (i.e.,  sta- 
tionarity)  can  be  concluded  from  a  run  of  an  MCMC 
algorithm.  Nur  et  al.  (2001)  give  further  details  of  the 
above  methodology.  In  that  paper,  the  methods  were 
applied  to  some  well-known  data  sets,  and  was  found  to 
reject  convergence  where  some  other  less  dynamically- 
driven  methods  concluded  convergence  of  chains. 

3.  PHASE  RANDOMISATION  AS  AN 
MCMC  CONVERGENCE  DIAGNOSTIC:  AN 
EXAMPLE 

There  is  a  variety  of  tests  for  convergence  of  MCMC 
algorithms.  Raftery  and  Lewis  (1996)  reduce  the  out¬ 
put  of  a  chain  to  a  two-state  Markov  chain  and  ap¬ 
ply  analytically  explicit  results  to  the  modified  output. 
Clearly,  this  is  a  form  of  discretisation  and  there  is  the 
possibility  that  important  information  about  the  origi¬ 
nal  process  may  be  lost.  Heidelberger  and  Welch  (1983) 
adopt  a  spectral  analysis  approach,  as  does  Geweke 
(1992).  These  and  other  algorithms  are  available  in 
the  software  package  CODA  (Best  et  al.,  1995). 

We  briefly  describe  an  analysis  of  a  widely-used 
‘benchmark’  data  set,  and  compare  the  relative  perfor¬ 
mance  of  the  existing  methods  with  the  present  method. 

The  example  concerns  mortality  rates  in  12  hospi¬ 
tals  performing  cardiac  surgery  on  babies:  see  Spiegel- 
halter  et  al.  (1994).  The  authors  proposed  a  ran¬ 
dom  effects  model  for  the  number  of  deaths,  r3,  in 
hospital  j ,  with  true  unknown  mortality  probability 
Pj ,  as  follows:  rj  ~  Binomial (;q,  ny)  ( j  =  1,...,12), 
log Pj  =  bj,  bj  ~  t  =  1/a2,  fi  ~  N  (0, 10“6), 

r  ~  T  (10~3, 10-3).  The  analysis  was  restricted  to  a 
short  run  of  200  epochs.  The  timeplot  of  the  MCMC 
run  appeared  to  be  similar  to  a  bilinear  stationary  time 
series,  based  on  the  simulations  we  described  in  the 
previous  section.  Smoothing  densities  of  the  higher  or¬ 
der  cumulant  estimates  were  plainly  unimodal  around 
zero,  and  standard  quantile  plots  ascertained  Normal¬ 
ity  of  the  surrogates’  cumulants  (supported  strongly  by 
the  Shapiro-Wilks  test) .  We  can  thus  conclude  that  the 
MCMC  algorithm  has  converged.  This  is  supported  by 
the  diag  assessment  in  BUGS,  by  Raftery  and  Lewis’ 


test,  and  by  Heidelberger  and  Welch’s  test.  However, 
Geweke’s  test  fails  for  this  example  because  of  the  very 
short  run.  although  it  passes  if  a  considerably  longer 
run  is  used. 

A  detailed  discussion  of  this  analysis,  along  those  of 
two  other  data  sets,  is  given  by  D  Nur,  KL  Mengersen 
and  RC  Wolff  in  an  as  yet  unpublished  manuscript.  It 
indicates  that  phase  randomisation  performs  at  least 
as  well  as  other  existing  methods  in  the  assessment  of 
MCMC  convergence  and,  moreover,  it  is  more  infor¬ 
mative  about,  higher  order  statistical  structures  which 
in  turn  can  classify  stationarity  and  linearity.  Their 
work  also  suggests  that  higher  order  cumulants  from 
surrogate  time  series  appear  to  be  asymptotically  Nor¬ 
mally  distributed,  thus  providing  a  route  to  robust  for¬ 
mal  testing  of  convergence  (stationarity)  hypotheses, 
and  calibration  thereof.  There  also  appears  to  be  ev¬ 
idence  that  the  Metropolis-Hastings  algorithm  results 
in  a  Markov  chain  which  is  geometrically  ergodic  to  the 
average  when  the  target  density  is  log-concave  in  the 
tails. 

4.  THEORETICAL  ISSUES  FOR  PHASE 
RANDOMISATION 

To  give  the  above  methodology  a  firm  theoretical  basis, 
it  is  required  to  prove  that,  third  (and  higher  order) 
cumulants  of  a  stochastic  process  can  be  bootstrapped 
with  accuracy  o(?7-1/2).  Results  of  Gotze  and  Hipp 
(1983)  can  be  employed  to  verify  this. 

Let  {£/}  be  independent,  and  identically  distributed 
(iid)  random  variables.  Generalising  the  Wold  Decom¬ 
position  Theorem  for  stationary  processes,  we  write 
X,  =  P  +  J2  bj£(-j  +  E  E  hj£t-j£t-k  +  •  • .,  and  clearly 
A',  is  non-linear  if  any  of  the  higher  order  coefficients 
are  non-zero. 

We  consider  the  formal  Edgeworth  expansion  of  or¬ 
der  s  — 2  of  the  third  cumulant,  of  A'/,  as  follows.  Define 
Yjkt  =  X,Xt-jXt-k  -  Okj,  where  akj  is  the  theoretical 
third  cumulant,  of  X,.  Let  1)  denote  the  matrix  form  of 
Yjkt.  Gotze  and  Hipp  (1994)  obtain  valid  formal  Edge- 
worth  expansions  for  sums  of  weakly  dependent  ran¬ 
dom  vectors,  with  error  of  approximation  o  (n-("-2)/2) 
if  the  moments  of  order  .s  +  1  are  bounded,  a  conditional 
Cramer  condition  holds,  and  the  random  vectors  can  be 
approximated  by  other  random  vectors  which  sat.isy  a 
strong  mixing  condition  and  a  Markov-type  condition. 
We  extend  their  result,  as  follows. 

Assume  the  following. 

(Al)  Let,  {e(}  be  an  iid  sequence  such  that  E  (£t  )  =  0, 
E  (s^)  =  1,  E  (ejigi<s+1))  <  oo,  for  some  s  >  3, 

q>  I- 
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(A2)  For  linear  processes,  J2T=m.  IM  <  cexp  (-am), 
a  >  0,  for  all  m  sufficiently  large. 

(A3)  Let  /  denote  a  strongly  contracting  and  con¬ 
tinuous  differentiable  function,  and  let  et  have 
density  satisfying  E  \f  (sq, . . .  ,£d)|  <  oo,  /  being 
positive  and  continuous. 

(A4)  T  =  limn^oo  (n-1/2  Yl't= 1  ^t)  exists  and  is  posi¬ 
tive  definite.  Denote  the  quantity  under  the  limit 
as  S„. 


Suppose  that  \f(x)\  <M(  1  +  |a;|s°)  for  every  vector 
x.  If  the  assumptions  as  set  out  in  Gotze  and  Hipp 
(1994)  hold,  then  there  exists  <5  >  0  not  depending  on 
/  and  M,  and,  for  any  k  >  0,  the  exists  a  constant 
C  =  C(M)  >  0  not  depending  on  /,  such  that 


Ef(Sn)~  I 


fdlpn,s 


<Cw(f,n  fc)+o^n  (<5  2)/,2V 


where  ip  is  a  functional  of  signed  measures  relating 
to  the  determinant  of  F,  the  term  o(.)  depends  on  / 
through  M  only,  and  w  is  a  supremum  operator  on  a 
Lipschitz  condition  for  /  constraining  y  to  be  less  than 
n~k  in  norm. 

Under  conditions  (Al)  through  (A4),  the  result  holds 
for  Xt,  and  the  required  Edgeworth  expansion  can  be 
obtained. 

In  an  as  yet  unpublished  manuscript  in  preparation 
by  D  Nur,  RC  Wolff  and  KL  Mengersen,  the  conditions 
for  this  theorem  are  being  confirmed. 
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ABSTRACT 


In  this  work  we  introduce  importance  sampling  techniques  for  the 
assessment  of  a  class  of  open-loop  digital  phase  modulation  re¬ 
ceivers  with  random  carrier  phase  tracking  in  additive  white  Gaus¬ 
sian  noise  channels.  We  consider  a  symbol-by -symbol  phase  de¬ 
tector  consisting  of  a  bank  of  nonlinear  stochastic  filters  track¬ 
ing  the  random  phase  carrier  and  a  decision  algorithm  driven  by 
the  filters’  innovations.  For  the  irreducible  error  floor  assessment 
we  use  an  importance  sampling  technique  relying  on  large  devia¬ 
tions  principles  that  results  in  a  multiple  mode  simulation  density. 
The  noisy  operation  of  the  receiver  is  addressed  with  an  adaptive 
importance  sampling  technique.  Simulations  yield  practically  the 
same  results  obtained  with  conventional  Monte  Carlo  with  remark¬ 
able  time  gains. 


1.  INTRODUCTION 


Symbol-by-symbol  detection  and  random  carrier  phase  tracking  in 
additive  white  Gaussian  noise  (AWGN)  channels  is  a  particular 
scenario  considered  in  reference  [1].  Focusing  on  this  particular 
problem,  in  this  paper  we  are  interested  in  the  development  of  im¬ 
portance  sampling  (IS)  techniques  for  fast  simulation  of  the  pro¬ 
posed  receiver.  Fast  simulation,  depends  on  the  appropriate  choice 
of  a  new  simulation  density  which  may  be  a  difficult  task  partic¬ 
ularly  for  highly  nonlinear  models  that  exhibit  complicated  error 
sets.  This  is  the  case  analyzed  in  this  paper.  In  [2]  we  presented 
IS  results  for  the  error  floor  operation  of  the  receiver  based  on  the 
error  set  knowledge  and  using  large  deviations  theory  (LDT)  prin¬ 
ciples.  To  analyze  the  receiver  behavior  when  observations  are 
noisy,  and  adaptive  importance  sampling  (AIS)  technique  must  be 
used  because  the  error  set  becomes  unknown  (see  [3]). 

The  paper  is  structured  as  follows:  Section  2  presents  the  com¬ 
munications  model  and  some  fundamental  IS  aspects.  In  Section  3 
we  derive  the  error  set  for  density  biasing  using  LDT,  and  present 
the  main  aspects  of  the  AIS  technique  applied.  Implementation 
aspects  are  also  included  in  this  section.  In  section  4  we  show  the 
results  of  our  IS  analysis. 


This  work  was  supported  by  Portuguese  program  Praxis  XXI,  under 
project  2/2.1/TIT/1583/95. 


2.  PROBLEM  FORMULATION 

2.1.  Dynamics  and  observation  model 

Consider  the  discrete  base-band  received  signal  sampled  N  times 
per  kth  symbol  interval  [fcTs,  (k  +  1)T„]  of  duration  7’s: 

s„  =  exp  [j  +  vn,  n  =  1, . . .  ,N 

where  is  the  digital  phase  sequence  associated  to  one  of  M 
symbols,  a,  e  (at,  •  •  •  ,  qm}>  <t> n  is  a  discrete  Brownian  motion 
described  by  <pn  =  <j>n- 1  +  <5„,  where  <5„  is  a  zero  mean  white 
Gaussian  sequence  of  variance  crj;  vn  is  a  complex  zero  mean 
white  Gaussian  sequence  of  variance  cr^. 

2.2.  Receiver  description 

The  receiver  proposed  in  [1]  consists  of  a  bank  of  M  ‘matched’ 
stochastic  nonlinear  filters  (NLF)  driven  by  the  same  input  sn  and 
a  decision  algorithm  driven  by  the  filters  innovations  processes. 
The  detector  decides,  at  the  end  of  the  current  symbol  interval, 
according  to  a  minimum  Euclidean  metric  computed  from  those 
innovations.  Parameters  of  the  selected  NLF  are  used  as  initial 
conditions  to  all  NLFs  for  the  next  symbol  interval  (see  [1]  for 
details).  This  corresponds  to  a  symbol  aided  decision  criterion. 

Matching  to  symbol  a,  consists  of  eliminating  the  modulating 
sequence  from  the  observation  vector  giving  rise  to  observations 
denoted  by  The  NLF(,,)  propagates  recursively  probability 
densities  of  phase  <pn  conditioned  on  the  observations  Den¬ 
sities  are  represented,  for  this  scalar  phase  process,  as  Tikhonov 
functions  with  mean  4>  and  concentration  parameter  7.  Propaga¬ 
tion  is  accomplished  in  two  steps,  filtering  (F)  and  prediction  (P), 
implementing  the  following  equations: 

•  Filtering 


4'n  +  In  ai  Sin  4>n 

4’n  +  7n  al  COS  4>£ 

|Vj  + 


=  arctan 


7n  = 


+ 


In 

„p 


<») 


(i) 


2^4-  (zi,ncos<£n  +  Z2,„sin$T)  +  (7^) 


1/2 


(2) 
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•  Prediction 


where  1e  (•)  is  the  indicator  function  for  the  error  set  E.  When 
minimizing  the  IS  estimator  variance 


4>n+ 1  =  4>n  (3) 

7n+l  =  1P  (in,  (4) 

For  next  symbol  processing,  all  the  M  NLFs  are  initialized 
with  parameters  (^w+1j7jv+i)  from  the  elected  branch. 


23.  Modeling  aspects  and  IS  fundamentals 

Fig.  1  is  a  schematic  representation  of  the  communicate  en  model 
considered  in  the  previous  subsections.  In  this  figure  Ak  =  a,  is 


Fig.  1.  Model  for  system  simulation 


°/s  =  j J  lB(y,u)W(y,u)p(y,u)dydu  -  P? 

for  a  given  Nis,  we  act  on  W{  )  through  p*(). 

Note  that  in  (7)  each  simulation  sample  is  in  general  a  record 
with  UL  =  [Al,  .  This  is  not  the  case  for  Pe  under  conven¬ 
tional  MC,  and  the  difference  between  the  approaches  relies  on 
the  type  of  simulation  -  stream  simulation  for  MC,  and  error  event 
simulation  for  IS.  Error  event  simulation  introduced  in  [4]  is  con¬ 
sidered  as  the  method  especially  appropriate  for  IS  in  the  presence 
of  memory  effects.  It  consists  of  generating  independent  realiza¬ 
tions  of  UL  for  a  given  information  pattern  AL  while  testing  Ak 
for  error  occurrence.  We  have  modelled  part  of  UL  through  I  mak¬ 
ing  U 1  =  [A1,  V1, 7] .  Bias  done  to  the  simulation  density  will  be 
conditioned  on  each  pattern  AL  belonging  to  a  finite  denumerable 
set  of  configurations. 

3.  IMPORTANCE  SAMPLING  PROCEDURE 
3.1.  Error  set  derivation 


the  transmitted  symbol  with  a,  €  {ai ,  •  •  •  ,  aju } ,  and  Ynk  and  Non-linear  recursion  in  equations  (1)  to  (4)  along  with  the  decision 
SN-  are  the  transmitted  and  received  signal  vectors  respectively  algorithm,  preclude  in  general  the  error  set  analysis.  Restricting 

with  N  samples  each  during  symbol  interval  [ kT„ ,  (fc  +  1  )T„].  our  analysis  to  the  space  of  the  random  phase  increments  denoted 

Nk  is  the  AWGN  vector  Vfyfc  —  [r'i , . . . ,  rw]fc,  and  Ank  —  by  T>,  (cr|  ^  0,  cr„  =  0),  we  obtain  a  simpler  model  that  is  analyt¬ 
ic  the  phase  rncrement  vector.  Since  the  symbol  de-  ically  t^Me.  For  simplicity  we  derive  here  the  error  set  for  the 

lector  propagates  density  parameters  from  the  (k  -  l)th  symbol  binary  case  (M  =  2).  The  filter  equations  in  the  error  floor  are: 
interval  to  the  k  one,  an  observation  record  of  size  L 

SL  =  [Sk,Sk-U...,Sk-L+1],  L>  1  (5)  =  "S  (Z"S))  =  (^»)a»  7^  =  00 

S  =  Sk,  L  =  1  (6)  <£n+l  =  <i>P  7n+l  =  1P  {(?l)  ■ 

influences  estimate  Ak  through  the  initialization  variable  Ik-  Ac-  In  the  receiver,  branch  t  decision  metric  conditioned  on  trans- 

cordingly,  we  define  the  associated  transmitted  signal  and  distur-  mitted  a,  (index  t  refers  to  target  -  the  symbol  at  which  is  to  be 

bance  records  Y  and  U  =  [A  ,  V  ]  respectively.  In  IS  sim-  detected  instead  of  a»),  becomes 
ulation,  variable  Ik  that  is  normally  hidden  during  Monte  Carlo 

(MC)  simulation  will  be  generated  to  model  the  decision  feedback  JL  \  ,  ,  ,  2 

mechanism.  £  e»(*  +*")  -  e(-'5'*>e*f<  °  (8) 

The  unbiased  MC  error  probability  estimator  is  »=i 


1 

where  g  ^Ak,  Akj  is  one  if  Ak  f-  Ak  and  zero  otherwise,  Nmc 
being  the  number  of  simulation  runs.  For  i.i.d.  errors  var  jpc  j  = 

°mc  —  Pe  (l  —  Pe)  /Nmc-  IS  simulation  is  intended  to  reduce 
high  values  of  omc /Pe  associated  with  low  Pe.  For  this,  we  must 
modify  the  simulation  density  p([F,  (/])  obtaining  p*([F,  (/]*)  to 
generate  the  records  [F,  U\  *  in  order  to  obtain  more  frequent  er¬ 
rors  -  the  important  and  otherwise  rare  events.  Although  biased 
simulation  densities  lead  naturally  to  biased  error  rate  estimates, 
IS  provides  appropriate  correction  for  each  error  event  by  means 
of  the  likelihood  ratio  W([YtU\t)  =  p([F,t/]*)  /p*  ([F,  (/]*). 
This  yields  the  unbiased  error  rate  IS  estimator 

1  NIS 

p*  =  nj;  E  1e  ([y-  u\: )  w  ([f,  u\: )  (?) 


where  <j>P^  =  9^^  —  9^}_l  +  <j>n- i  and  is  the  receiver 
initialization  for  all  NLFs.  In  the  error  floor,  <j>P^  is  the  sum  of  the 
random  phase  <j> o  (previous  to  4>\)  with  an  error  term  7<b-  =  0^  — 
9 $  resulting  from  the  ( k  —  \)th  decision  feedback  -  erroneous 
detection  of  symbol  a,  when  the  transmitted  symbol  was  a,. 

We  now  define 

|3-  =  |A/v  6  R.n  :  7ra|s  >  rrt|s| 

as  the  error  set  in  V  conditioned  on  transmission  of  symbol  as  and 
initialization  error  7^. 

Equation  -ks\,  —  wt\s  defining  the  boundaiy  dE°tj  can  not 
be  solved  in  general.  However  we  may  identify  <9  A,  as  the  infinite 
set  of  denumerable  solutions  satisfying 

cos  (<5i  -  7*| j)  =  cos  “  +  Si  -  Im'j  (9) 

cos Sn  =  cos  +  Sr^  ,  n  =  2,...,N.  (10) 
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where 


There  is  a  finite  collection  of  solutions  dA27r  C  9  A  containing 
only  2n  elements  obtained  by  the  intersection  9A2„  =  9A  n 
{An  €  [— 7r,  7r]  x  •  •  ■  x  [-- n-.ir]}.  Any  single  element  in  9A2lr 
allows  the  derivation  of  the  remaining  2^  —  1  elements  and  also  of 
the  symmetry  center  of  the  error  set,  which  we  designate  by  Ca  ^ • 
We  are  able  to  identify  a  point  in  9A27r  for  each  one  of  the  2 
quadrants  Qi  wrt  to  Ca„  .  In  general  Cam  does  not  coincide  with 
the  origin  of  V.  As  an  example  of  such  an  error  set,  we  show  in 
Fig.  2  a  diagram  for  N  =  2  obtained  by  random  generation  of 
samples  in  IR2.  The  solution  set  9A2Tr  =  {A, 3 ,  A>2 ,  A<3 ,  AS4 } 
is  also  represented.  The  error  region  presents  a  periodic  structure 
generally  non-connected  and  extending  all  over  V. 
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Fig.  2.  Error  set  example  for  N=2.  Gaussian  sampled  diagram 


3.2.  Large  deviations  and  density  biasing 

Vector  An  €  V  is  generated  according  to  the  simulation  density 

',(a”)=(v5y 

Identification  of  C  V  shows  that  the  dominating  point 

(DP  -  see  [5]  -  Definition  2),  does  not  exist  for  the  case  under  anal¬ 
ysis.  The  DP  would  be  used  as  a  shift  term  for  p(Ajv)  to  obtain 
p*  (An)  (a  particular  case  of  the  exponential  tilting).  When  there 
is  no  DP,  Theorem  2  in  [5]  states  the  conditions  for  the  use  of  a 
set  of  points  {iy, . . .  I'm}  for  the  mean  translation  of  a  finite  num¬ 
ber  of  terms  that  will  constitute  the  biased  density.  This  results  in 
p*()  being  a  mixture  of  Gaussian  terms  for  appropriate  coverage 
of  E®s<1  .  The  set  {iy , . . .  i/m}  must  contain  at  least  all  the  min¬ 
imum  rate  points  (MRPs)  of  E  (see  also  Def  2  in  [5])  and  other 
points,  which,  when  used  as  bias  vectors,  will  improve  the  cover¬ 
age  of  E  with  the  biased  density  p*().  Taking  advantage  of  the 


symmetry  shown  by  E°MtI  wrt  Can  ,  we  seek  the  quadrantwise 
minimization  of  the  Euclidean  norm  of  An  £  dE  in  order  to  find 
all  the  MRPs  wrt  the  sets  £q,  =  Et\.,iiU  n  Q<-  Due  to  the  big 
number  of  solutions,  we  selected  for  simulation  biasing  only  the 
Nm  solutions  with  the  smaller  Euclidean  norms. 

33.  Density  biasing  using  AIS 

Optimization  of  IS  density,  consists  now  of  biasing  in  the  product 
space  DxVxI  since  (a2  ^  ^  0).  For  the  minimization 

of  o]s  we  use  a  stochastic  search  because  we  have  no  information 
about  the  error  set  EDVI  €  V  x  V  x  X. 

Considering  the  product  space  D  x  V  X  1,  with  V  being  the 
2 N—  dimensional  noise  sample  space  and  1  the  one-dimensional 
space  of  the  initialization  phase  error,  we  modelled  the  initializa¬ 
tion  error  4> f  as  Gaussian  its  parameters  being  easily  estimated  in 
a  short  simulation  preamble.  The  parameter  yf  was  kept  constant 
with  its  value  in  the  error  floor  that  is  yf  =  yp  (exp(-o^/2)) . 

We  use  a  parametric  AIS  technique  adapted  from  that  pro¬ 
posed  in  [3],  We  estimate  the  conditional  mean  E  {{An,  Vn,I) 

|  (A  n,Vn,I)  €  EDVI}.  Optimization  must  yield  a  multiple  term 
solution  that  will  constitute  the  modified  density  p*  (A n,  Fn,  /)■ 
The  proposed  estimation  cycle  is  increasingly  repeated  in  our  case, 
as  <t2  increases  while  is  kept  constant.  The  biased  density  is 
presented  in  the  next  subsection  for  M  >  2.  The  major  modifi¬ 
cations  we  have  done  to  the  technique  proposed  in  [3]  consist  in 
using  as  starting  points  for  the  search,  the  optimized  biases  in  9A„ 
and  no  bias  at  all  for  VN  and  I.  This  shortens  the  time  required  for 
starting  the  AIS  algorithm.  Quadrant  separation  in  V  is  essential 
to  keep  the  different  bias  terms  separated  during  optimization. 

3.4.  Implementation  aspects 

In  the  error  floor,  h\j  depends  on  the  result  of  estimate  Afc-i. 
However,  the  correct  initialization  (Ij\j  —  0  in  the  error  floor) 
happens  naturally  almost  all  the  time  for  the  modulation  consid¬ 
ered.  We  conducted  tests  with  all  the  possible  values  for  their 
a  priori  probability  P  (Ii\j)  being  estimated  recursively,  and  the 
differences  to  use  only  Ij\j  were  negligible.  For  the  noisy  chan¬ 
nel,  we  modeled  Ijy  as  Gaussian  (/  ~  A/"(0,  sj}}  as  explained 
before  with  Sj  estimated  in  a  short  preamble  due  to  its  dependence 
from  both  cr$  and  cr2. 

We  considered  until  now  binary  signaling  (M  =  2).  With 
an  M-ary  signaling  scheme,  the  error  set  conditioned  on  a,  is  the 
union 

M 

E,  =  U  Ei\* 

i=l 

i^a 

which  may  render  IS  biasing  suboptimal.  To  mitigate  this,  we  in¬ 
troduce  another  level  of  multiple  biasing  in  our  IS  simulation.  Our 
biased  density  is  then  a  Gaussian  mixture  of  (M  —  1)  •  Nm  terms 
for  appropriate  i  target  addressing  and  error  set  J Eq,  coverage  re¬ 
spectively.  The  referred  density  becomes 

Nm  M 

p*(An,Vn,/M=  £  ]TP(£(f,m)a5)x 

m=  1  t=  l 
t*a 

xp(AN,VN,I\B(t,m)a,)  (12) 

where  p  (An,  Vn,  I\B{t,  m)a,)  is  the  3V  +  1  dimensional  Gaus¬ 
sian  term  with  mean  B(t,m)a,  -  the  bias  vector  in  V  x  V  x  X  if 
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a,  was  transmitted.  The  P  ( B(t,m)a ,)  are  the  sampling  proba¬ 
bilities  for  the  bias  terms.  They  were  made  inversely  proportional 
to  the  Euclidean  norms  of  the  respective  bias  shift  terms,  but  we 
do  not  know  how  much  this  option  approaches  optimality.  Simula¬ 
tion  density  at  the  error  floor,  is  a  mutatis  mutandis  simplification 
of  (12)  since  we  are  only  generating  Ajv  €  V. 

4.  RESULTS 

We  tested  our  IS  methodologies  in  a  practical  example  with  4-FSK 
modulation  (A/  =  2. 6/(27 rTs)  rads-1  between  adjacent  sym¬ 
bols).  The  number  of  samples  per  symbol  was  set  to  N  =  10. 
Simulation  gain,  denoted  by  7*  is  defined  by  the  ratio  between 
the  MC  simulation  time,  Tmc,  and  the  corresponding  IS  time, 
Tis,  (is  =  Tmc/Tis)-  Simulations  were  stopped  when  empir¬ 
ical  precision  reached  a  value  lower  than  10%  (see  for  example 
[6]).  Figure  3  represents  simulation  data  corresponding  to  the  er¬ 


ror  floors  for  values  of  cr|  ranging  from  0.1  rad2  (l/o^  =  10) 
to  0.03  rad2  (l /<r^  =  33.3) .  Density  biasing  was  done  in  D  ac¬ 
cording  to  LDT  principles.  The  left  vertical  scale  represents  the 
estimate  of  symbol  error  probability,  P, ,  whereas  the  right  vertical 
scale  represents  simulation  gain  1,. 

Notice  the  practical  coincidence  of  P,  values  of  IS  (mark  o) 
with  those  of  MC  (mark  x)in  the  range  (l/a^  =  10)  to  (l/<r^  = 
22.2).  Simulation  gains  increase,  in  this  range,  from  7,  =  55  to 
Is  =  24000.  The  value  of  Tmc  for  (l/a^  =  22.2)  is  13.7  hours 
using  a  PlH@450MHz  computer. 

The  IS  results  presented  in  Figure  4  were  obtained  with  AIS  as 
they  concern  the  receiver  performance  in  a  wide  range  of  operating 
conditions  (including  the  error  floor).  Also  represented  are  the 
values  of  Ps  obtained  with  MC  for  <r|  =  0.05rad2  and  values  of 
Eb/No  equal  to  19,  22,  25  and  31  dB;  the  corresponding  gains 
Is  were  8.2,  54.3,  158.2  and  451  respectively;  for  Eb/No  =  17 
dB,  there  is  no  practical  simulation  gain.  Once  again  we  stress  the 
practical  coincidence  of  the  estimated  values  of  P3  provided  by 
both  simulators.  Points  on  the  curve  corresponding  to  cr|  =  0.045 
rad2  took  10  minutes  (with  the  above  mentioned  computer)  in  the 


range  of  [20, 50]  dB,  while  the  MC  points  in  the  same  range  would 
take  an  estimated  72.5  hours. 


Fig.  4.  Ps  versus  Eb/No  for  two  values  of 
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ABSTRACT 

In  a  wireless  communication  environment,  interferes  are  of¬ 
ten  assumed  to  be  Poisson  distributed  in  space.  While  they  are 
alive,  they  constantly  emit  pulses  which  interfere  with  signal  re¬ 
ception  at  the  observation  point.  Each  transmitted  pulse  suffers  a 
power-law  attenuation  with  the  distance  to  the  receiver.  It  has  been 
shown  in  the  past  that  at  baud  rate  sampling  at  the  receiver,  the  in¬ 
terference  is  marginally  a-stable.  In  this  paper  we  consider  the 
interference  at  different  sampling  points.  We  assume  that  the  in- 
terferers  have  their  own  life  of  transmission  sessions,  the  durations 
of  which  are  heavy -tail  distributed  random  variables.  This  assump¬ 
tion  is  consistent  with  the  characteristics  of  multimedia  traffic.  We 
show  that  the  resulting  interference  at  different  sampling  points  is 
a  jointly  a-stable  process  and  exhibits  long-range  dependence  in 
the  generalized  sense.  Numerical  simulations  are  consistent  with 
our  theoretical  findings. 

1.  INTRODUCTION 

In  a  wireless  communications  network,  interference  at  the  receiver 
is  composed  of  interference  due  to  other  transmitting  terminals 
(self-interference)  and  thermal  noise. 

Knowledge  of  the  statistics  of  interference  is  important  in  sig¬ 
nal  detection.  While  the  thermal  noise  term  can  be  Gaussian  dis¬ 
tributed,  the  self-interference  part  significantly  deviates  from  Gaus- 
sianity.  The  a-stable  distribution  [6]  has  been  shown  to  be  of  par¬ 
ticular  interest  for  modeling  self-interference. 

According  to  a  statistical-physical  model,  originating  from  the 
works  of  Furutsu  and  Ishida  [2],  and  later  advanced  by  Middle- 
ton  [4]  ,  Sousa  [3],  Nikias  [6],  [5],  the  contributions  of  a  Poisson 
distributed  field  of  interferers,  subjected  to  power-law  attenuation 
as  they  propagate  to  the  receiver,  add  up  to  a  marginally  a-stable 
noise  process. 

In  [3]  [6]  [5],  the  samples  of  the  interference  obtained  at  sym¬ 
bol  rate,  are  assumed  to  be  independent  identically  distributed. 
However,  if  one  takes  into  account  the  characteristics  of  transmis¬ 
sion  periods  of  each  interferer,  the  latter  i.i.d.  assumption  does  not 
hold.  Once  an  interferer  starts  transmitting,  it  remains  on  for  a 
certain  period  of  time.  This  by  itself  introduces  some  correlation 
in  the  transmitted  signal,  which  depends  on  the  distribution  of  the 
session  life. 

In  this  paper  we  show  that  self-interference  at  any  two  sam¬ 
pling  points  are  jointly  a-stable  distributed.  By  further  assuming 
that  the  session  duration  of  each  interferer  is  heavy-tail  distributed, 
we  show  that  the  self-interference  sampling  points  is  a  long-range 


dependent  process  in  the  generalized  sense.  The  assumption  on 
heavy-tailed  session  durations  (user  holding  times)  is  consistent 
with  characteristics  of  high-speed  wireline  network  measurements. 
Although  no  extensive  studies  have  been  conducted  on  the  charac¬ 
teristics  of  wireless  network  traffic,  assuming  seamless  connectiv¬ 
ity  between  wireline  and  wireless  networks  would  imply  similar 
characteristics  between  wireless  and  wireline  user  holding  times. 
The  latter  result  suggests  that  the  noise  is  strongly  correlated  and 
impulsive,  which  posts  new  challenges  in  signal  detection  at  the 
receiver. 

The  paper  is  organized  as  follows.  In  next  section,  relevant 
mathematical  background  is  provided.  Following,  in  Section  III, 
is  the  detail  description  of  the  noise  model.  The  joint  statistics 
are  investigated  in  Section  IV.  followed,  in  Section  V.  by  the  proof 
of  long-range  dependence  under  the  assumption  that  the  session 
life  of  the  interferers  are  heavy-tail  distributed.  Simulations  and 
examples  are  given  in  Section  VI. 

2.  MATHEMATICAL  BACKGROUND 

2.1.  Multivariate  a-Stable  Distributions  [8] 

Vector  X  =  (A'i ,  X2, ....  Xd)  is  an  a-stable  random  vector  in  Rd 
if  and  only  if  there  exists  a  finite  measure  T  on  the  unit  sphere  Sd  of 
Rd  and  a  vector  p in  Rd  such  that  <b(u;)  =  EexpU  X^_.,  WfeX/t}, 
for  0  <  a  <  2  is  given  by 

$(w)  =  exp{-  J  |(w,s)|a  (l  ~jsign((u;,s))tan 

■T(ds)  +  (1) 

if  a  /  1;  If  a  —  1,  tan  *f-  is  replaced  by  In  |(w,  s)|.  a  G 
(0,  2]  is  the  characteristic  exponent. 

Suppose  d  =  1.  then  S 1  consists  of  two  points  {-1}  and  {1}, 
and  the  spectral  measure  T  is  concentrated  on  them.  It  becomes  the 
univariate  a-stable  distribution.  Denoted  by  X  ~  Sa{o,  0,  /<).  if 
a  /  1,  its  characteristic  function  is  given  by 

*(w)  -  exp{-||wn(r({l})+r({-l}))-jsign(«) 

•  (r({i>)  —  r({ — 1}))  tan  +j^} 

=  expj— cr“|u;|a(l  -  j/3sign(w)  tan  +  j/twj 

(2) 

where  a  is  the  scale  parameter,  0  is  the  skewness  parameter,  and 
p  is  the  location  parameter. 
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2.2.  Codifference 

a-stable  distributions  are  known  for  their  lack  of  moments  of  order 
larger  than  a.  In  particular,  for  a  <  2,  the  second-order  statistics 
do  not  exist.  In  such  case,  the  role  of  covariance  is  played  by  the 
covariation  or  the  codifference  [8]. 

The  codifference  of  two  jointly  SaS,  0  <  a  <  2,  random 
variables  X\  and  X2  equals: 

Rxi  ,x2  =  7Xi  +  7x2  -  7x1  -x2  (3) 


1).  At  any  given  time  slot,  there  are  random  number  of  emerg¬ 
ing  sources  which  begin  to  emit  waveforms  which  interfere 
with  the  signal  of  interest.  The  number  of  emerging  sources 
are  a  Poisson  random  variable.  Moreover,  their  locations 
are  also  spatially  Poisson  distributed.  In  a  two-dimensional 
space,  the  number  of  emerging  sources  in  a  region  R  of  area 
A  is  Poisson  distributed  with  density  A. 4,  i.e. 

P[Number  of  sources  in  R  =  k]  =  e~XA .  (8) 


where  7x  is  the  scale  parameter  of  the  SaS  variable  X. 

A  quantity  that  is  closely  related  to  the  codifference  Rxp+T)  x(t) 
is  [8]: 

I(pi,p2-,T )  =  -lnE{el(l,lx(t+T)+P2X(-t))} 

+  In  E{etpix<~i+T)}  +  In  E{e'p*x(t)}.  (4) 

The  above  quantity  reduces  to  the  codifference  for  the  case  of 
jointly  SaS  processes,  i.e. 

Rx(t+r),x(t)  =  —  1;  r).  (5) 

2.3.  Long-Range  Dependence 

A  second-order  process  x(t)  is  called  a  (wide-sense)  stationary 
process  with  long  memory,  or  long-range  dependence,  if  its  auto¬ 
correlation  function,  p(r),  satisfies  [9]: 

lim  p(t)/t^~1  =  c  (6) 

T— *  OO 

for  some  positive  constant  c  and  /3  €  (0, 1).  From  (6),  it  can  be 
seen  that  a  long-memory  process  is  characterized  by  an  autocor¬ 
relation  that  decays  hyperbolically,  as  the  lag  r  increases.  This 
is  in  contrast  with  the  exponential  decay  corresponding  to  short 
memory  processes,  e.g.  ARMA. 

The  following  generalization  of  the  concept  of  long  memory 
process  can  be  useful  processes  who  lack  autocorrelation.  [10] 

Definition  1  Let  x(t)  be  a  stationary  process.  We  say  that  xit)  is 
a  long-memory  process  in  a  generalized  sense,  if  7(1,  —1;  r),  as 
defined  in  (4),  satisfies 

lim  —  7(1,  —  1;  r)/r0~i  =  c  (7) 

T  — I  OO 

where  c  is  some  real  positive  constant  and  0  <  /3  <  1. 

The  notion  of  generalized  long-memory  has  been  used  in  [10] 
to  study  the  dependence  structure  of  the  power-law  shot  noise. 
(See  also  [13]) 

3.  THE  INTERFERENCE  MODEL 

As  in  [3],  we  assume  an  infinite  number  of  potential  sources  in 
the  source  domain.  They  are  emitting  pulses  that  may  be  seen 
at  the  receiver.  Our  basic  unit  of  time  is  the  symbol  interval,  or 
slot.  In  other  words,  we  are  considering  a  discrete  time  process. 
The  fundamental  assumptions  of  the  interference  model  are  the 
following. 


The  parameter  A  is  not  necessary  constant.  In  a  non-homogeneous 
case,  a  transformation  can  be  performed  to  map  the  Pois¬ 
son  process  from  non-homogeneous  to  a  homogeneous  one, 
c.f.[3], 

2) .  Once  the  interference  source  begin  to  transmit,  it  constantly 

emits  waveforms  for  a  random  duration  of  times  ( session 
life).  From  the  receiver's  point  of  view,  the  resulted  inter¬ 
ference  is  a  symmetrically  distributed  random  variable. 

3) .  The  waveform  propagation  loss  increases  logarithmically 

with  increasing  distance  between  the  source  and  the  receiver. 

In  terms  of  signal  amplitude  loss  function,  it  can  be  written 
as 


where  r  is  the  distance  and  7  may  vary  from  1  to  6  in  dif¬ 
ferent  environments. 

4).  The  sources  originated  at  different  time  slots  are  assumed  to 
be  independent  of  each  other.  The  inception  or  termination 
of  emission  of  a  source  will  not  affect  any  other  sources. 

The  receiver  using  an  omni-directional  antenna  is  located  in 
the  center  of  the  space  (plane  or  volume)  that  we  are  interested. 
The  received  signal  is  given  by 

z(t)  =  s(t)  +  ^  a(n)xi(t),  (10) 

te  active  sources 

where  s(t)  is  the  signal  of  interest  and  the  sum  is  the  interference. 
We  assume  a  standard  correlation  receiver,  which  correlate  the  re¬ 
ceived  signal  with  a  set  of  basis  functions  {(j>k(t),  k  =  1, ...,  n} 
and  produce  n-dimensional  vectors  Z,  S  and  X,  such  that 

Z'  =  §‘  +  ^xi  (11) 

i 

where  the  superscripts  represent  the  I-th  time  slot,  or  symbol  inter¬ 
val  and  the  X;  represent  the  contribution  from  the  7-th  source. 

Now  consider  the  instantaneous  interference  at  any  symbol  in¬ 
terval  m.  In  order  to  calculate  the  instantaneous  statistics,  the  num¬ 
ber  and  locations  of  the  active  sources  at  any  given  symbol  interval 
need  to  be  specified. 

Proposition  1  Assuming  the  mean  of  the  random  session  life  of 
the  interferers  is  finite,  denoted  by  p,  and  the  density  of  the  emerg¬ 
ing  sources  in  the  space  at  every  time  slot  is  A,  asymptotically,  the 
active  number  of  sources  in  any  time  slot  is  Poisson  distributed  in 
the  space  with  density  A p.  1 

An  immediate  result  follows  the  statement  by  use  of  the  result 
in  [3] 

1  Due  to  space  limit,  we  omitted  all  the  proofs  in  this  paper. 
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Corollary  1  If  the  received  influence  Xj  for  each  intetferer  is 
spherically  symmetric  distributed  the  instantaneous  interference 
is  SaS  distributed,  with  characteristic  exponent  a  =  2/y. 


Eq.(20)  implies  that  when  the  distance  between  two  samples  be¬ 
comes  asymptotically  large,  they  are  becoming  independent  a- 
stablc  random  variables. 


4.  JOINT  STATISTICS 


The  interference  at  different  symbol  time  intervals  constitutes  a 
stochastic  process.  Its  dependence  depends  on  the  transmitting 
characteristics  of  the  co-channel  users.  It  has  already  been  shown 
that  at  any  given  symbol  interval,  the  marginal  distribution  of  the 
self-interference  is  alpha-stable  distributed.  It  remains  to  be  seen 
whether  the  interference  at  different  time  slots  is  jointly  a-stable. 

The  interference  at  the  m-th  symbol  interval  is 

Ym  =  ^a(r,:)Xr.  (12) 

i 

To  simplify  the  presentation,  we  assume  the  vector  X  is  one  di¬ 
mensional.  To  evaluate  the  joint  statistics  of  Y,  we  calculate  the 
quantity 

E  {exp[jwi Ym  +  juJiYn]}  .  (13) 

Specifically,  we  have  the  following  result. 


Proposition  2  Let  the  mean  of  the  random  session  life,  L,  be  finite 
with  complementary  distribution 

FL(k)  =  P[L>k\,  k  =  l,2,...  (14) 

Then  the  joint  characteristic  function  ofYm  andY"  is  given  by 

$m,»(wi,W2)  =  exp  {—o  [JYi (r)|aii  |Q 

+  H2(t)\uH  +  W2|°  +  ^1  (t)|£*J2  1“]}  - 

(15) 


where 

poo 

a  =  —  A7r  /  x~ad< *0(x)  (16) 

Jo 

m—n 

ff,(r)  =  £  FL(l)  (17) 

1=1 

oo 

H2(r)  =  £  Fl(1).  08) 

1  =  711  —  11  +  1 

Here  $o(-)  is  the  characteristic  function  of  X. 


Remark  1  By  setting  u>2  =  0,  we  obtain  the  first  order  character¬ 
istic  function  of  the  interference  process,  i.e., 

*(„,)  =  e-'SS,^(OI«.r.  09) 

Recognizing  that  FL  ( l )  =  p,  which  is  the  mean  of  the  trans¬ 

mission  life  of  the  co-channel  users,  we  get  the  same  result  as  in 
the  last  section. 


Remark  2  As  r  tends  to  infinity,  Hrfr)  tends  to  zero,  and  Hi(t) 
tends  to  the  mean  of  transmission  time,  p.  The  joint  characteristic 
function  may  be  simplified  as 

lim  §Bl.(wi,w,)  -  e-^(k,l“+l*2!“)_  (20) 

r— >oo 

2  A  random  vector  X  is  spherically  symmetric  if  its  characteristic  func¬ 
tion  depends  only  on  the  Euclidean  norm  of  t,  i.e.  $x(t)  =‘/>(i|f ID- 


Corollary  2  Indeed,  the  interference  at  two  different  time  slots, 
which  arc  separated  by  t,  are  jointly  a-stable  distributed.  They 
may  be  represented  by  different  linear  combinations  of  indepen¬ 
dent  alpha-stable  random  variables. 


5.  LONG-RANGE  DEPENDENCE 


Since  the  interference  at  any  two  symbol  intervals  are  jointly  SaS, 
the  conventional  tools  such  as  auto-correlation  that  measures  the 
dependence  structure  of  a  stochastic  process  are  not  applicable. 
We  adopt  the  codifference  as  previously  defined  in  (4).  The  moti¬ 
vation  behind  it  is  that  in  many  practical  communication  systems, 
the  session  life  of  the  interferes  are  indeed  heavy-tail  distributed. 

An  example  is  given  by  the  communication  links  in  a  spread 
spectrum  packet  radio  networks.  In  a  spread  spectrum  network, 
multiple  access  terminals  use  the  same  channel.  The  signals  re¬ 
ceived  at  the  receiver  consists  of  superposition  of  the  signals  from 
all  the  users  in  the  network.  Assuming  no  multiuser  detection  and 
power-control,  the  interference  from  other  users,  or  self-interference, 
falls  into  the  scenario  described  in  this  paper.  As  more  and  more 
wireless  users  are  equipped  with  data  transmission  enabled  cell¬ 
phone.  the  resource  request  holding  time  is  found  to  exhibit  more 
and  more  variation.  In  other  words,  the  holding  time  are  heavy-tail 
distributed,  (cf.  [7]) 

A  simple  but  reasonable  assumption  on  the  distribution  of  the 
session  life  the  interferers  is  that  they  are  Zipf  distributed.  Zipf 
distribution  is  a  discrete  version  of  the  more  familiar  Pareto  distri¬ 
bution.  A  random  variable  X  has  a  Zipf  distribution  [1]  if 

P{X  >  k}  =  [1  +  (^=^2)]-°'',  k  =  k0,  k0  +  1,  fco  +  2... 

< 7 

(21) 

where  ko,  the  location  parameter,  is  an  integer,  o,  the  scale  param¬ 
eter,  is  positive  and  ct,  the  tail  index  is  positive.  In  this  paper,  to 
simplify  presentation,  we  set  o  —  ko  —  1.  and  a  >  1,  which 
implies  the  mean  of  the  session  life  is  finite. 


Proposition  3  Asymptotically,  the  process  formed  by  any  compo¬ 
nents  of  the  self-interference  at  different  symbol  intervals  is  a  long- 
range  dependent  a-stable  process,  i.e., 


lim 

n  — 400 


-/( 1,-lj  w) 

n°L~t 


—  c 


where 

n  :  time  distance  between  symbol  intervals ; 

a  l  '■  tail  index  of  the  session  life-time; 
c  :  positive  constant. 


(22) 


(23) 


6.  SIMULATIONS 


In  this  section,  we  performed  numerical  simulations,  of  which  set¬ 
tings  are  in  accordance  with  afore-mentioned  scenario.  We  assume 
standard  correlator  receivers  for  a  communication  link  which  is 
subjected  to  a  Poisson  field  of  interferers.  The  density  of  the  in¬ 
terferers  A  =  20.  and  and  the  session  life  is  Zipf  distributed  with 
q  =  1.2.  The  random  amplitude  of  the  interference  waveform 
are  rectangular  pulses  with  random  amplitude  of  1  or  -1.  Fig.l 
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shows  the  self-interference  presented  in  the  receiver.  We  apply  the 
characteristic  function  based  method  [11]  to  estimate  the  tail  in¬ 
dex,  and  the  result  is  0.6269,  which  is  very  close  to  the  theoretical 
value  2/7  (7  =  3.). 

As  shown  in  the  last  section,  when  the  session  life  is  heavy-tail 
distributed,  the  interference  must  exhibits  long-range  dependence. 
We  calculated  the  codifference  of  the  interference  ip-  and  the  result 
is  shown  in  Fig.2  in  a  log-log  scale.  The  linearity  of  the  log-log 
plot  confirms  that  the  self-interference  is  long-range  dependent  in 
the  generalized  sense.  The  estimated  slope  is  -0.1570,  which  is  in 
good  accordance  to  the  theoretical  value,  a  —  1  =  —0.2. 

7.  CONCLUSIONS 

In  this  paper,  we  show  that  in  a  Poisson  field  of  interferers,  where 
the  path  loss  is  a  power-law  function,  the  interference  at  different 
time  slots  are  jointly  alpha-stable  distributed.  If  we  assume  further 
that  the  transmission  session  life  of  the  interferers  are  heavy-tail 
distributed,  the  resulted  interference  is  long-range  dependent  in  the 
generalized  sense.  Numerical  simulations  confirms  our  theoretical 
derivations. 
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Figure  1 :  Self-interference  presented  in  a  communication  link  in  a 
Poisson  field  of  interferers. 


Figure  2:  Log-log  plot  of  generalized  codifference  of  the  Self¬ 
interference.  The  slope  of  the  least  squares  fitted  line  is  -0.1570. 
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ABSTRACT 

Approximations  to  the  locally  optimum  and  locally  optimum  rank 
score  functions  for  the  detection  of  a  known  signal  in  additive  sym¬ 
metric  o  stable  interference  have  recently  been  shown  to  introduce 
only  slight  performance  loss.  Here,  the  location  of  the  apices  of 
the  nonlinearities  is  shown  to  follow  an  approximate  linear  rela¬ 
tionship  with  the  characteristic  exponent,  o,  of  the  interference. 
The  distribution  of  the  corresponding  test  statistics  is  also  found 
to  be  approximately  Gaussian.  These  findings  make  implemen¬ 
tation  of  the  detectors  more  feasible  and  remove  some  expensive 
computational  burden. 

Keywords:  locally  optimum  detection,  nonparametric  statistics, 
alpha-stable  distribution. 

1.  INTRODUCTION 

Detectors  of  a  known  signal  in  additive  interference  of  unknown 
power  have  been  formulated  for  a  number  of  interference  distribu¬ 
tions  -  the  most  famous  and  widely  used  being  the  matched  filter 
(MF)  for  Gaussian  interference. 

Sources  of  impulsive  interference  have  presented  some  inter¬ 
esting  problems  for  detection.  Many  conventional  techniques  per¬ 
form  poorly  for  heavy-tailed  distributions.  Additionally,  the  o- 
stable  (aS)  distribution,  which  can  be  used  to  describe  some  im¬ 
pulsive  processes  [1],  has  no  general  closed  form  expression  for 
its  pdf,  thus  making  difficult  the  use  of  likelihood  ratio  procedures. 
For  this  reason,  it  is  necessary  to  investigate  alternative  strategies 
for  the  detection  of  signals  in  oS  interference. 

Consider  the  model 

X  =  8s+W  (1) 

where  X  =  [Ai ,  A'2, . . . ,  Xn]T  is  the  model  for  the  real-valued 
observations,  s  =  [si , . . . ,  4s  the  known,  deterministic  signal 
to  be  detected,  W  =  [Wi ,  W2  . . . ,  IF,,]7  is  a  stationary,  iid,  sym¬ 
metric  o-stable  (SaS)  interference  process  and  8  is  a  non-negative, 
real,  unknown  parameter.  To  determine  the  presence  of  s  in  X,  the 
tested  hypothesis  is  H  :  8  =  0  against  K  :  6  >  0. 

In  the  following  sections,  signal  detection  in  SoS  interference 
using  correlation  and  rank-based  detectors  is  discussed,  including 
suggested  approximations  that  will  aid  in  the  implementation  of 
the  said  detectors.  Following  that,  the  distribution  of  the  test  statis¬ 
tics  under  consideration  is  discussed. 

This  work  was  conducted  while  the  author  was  with  the  Australian 
Telecommunications  Research  Institute  &  School  of  Electrical  and  Com¬ 
puter  Engineering,  Curtin  University  of  Technology,  Australia 


2.  LOCALLY  OPTIMUM  DETECTORS 

Locally  optimum  (LO)  tests  attain  the  best  detection  performance 
amongst  the  class  of  detectors  of  the  same  size  for  weak  signal 
conditions,  that  is,  they  maximise  the  slope  of  the  detector  power 
function  at  6  =  0.  A  LO  detector  for  the  detection  of  a  signal  in 
the  model  (1 )  uses  the  test  statistic  [2] 

n 

Tlo(X)  =  ]T.s,  (2) 

1=1 

where  the  nonlinear  score  function  is 


and  f\v(x)  is  the  first  derivative  of  }w  (x),  the  pdf  corresponding 
to  the  random  process  IF. 

Due  to  the  absence  of  closed  form  expressions  for  }\v  (■>') 
when  IF  is  oS  distributed,  gi.n(x)  cannot  be  found  exactly.  In 
[1],  gw  was  found  numerically  for  some  SoS  cases  and  plotted  for 
a  number  of  values  of  a.  These  results  are  reproduced  in  Figure  1. 
Here,  and  throughout  the  rest  of  the  paper,  the  scaling  parameter 
of  the  SoS  distribution  is  set  to  1,  7  =  c°  =1.  No  generality  is 
lost  since  if  Y  is  an  SoS  random  process  with  scaling  parameter  c, 
then  Z  =  Y/c  is  an  SoS  random  process  with  scaling  parameter 
of  1. 


Fig.  1.  Locally  Optimum  score  functions  for  various  SoS  distri¬ 
butions. 
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While  these  numerical  approximations  can  be  made  to  be  ex¬ 
tremely  accurate,  their  computational  complexity  requires  that  prac¬ 
tical  implementation  uses  coarser  approximations.  These  approx¬ 
imations  invalidate  the  “locally  optimum”  feature  of  the  detector, 
and  yield  locally  suboptimum  (LSO)  detectors. 

Recently,  the  LSO-power  nonlinearity  was  introduced  [3, 4] 


<?Lso-p(aO  — 


>1*1  <  A 
,  1*1  >  A 


where  c  =  and  A  =  argmax  gL 0(x)  is  the  location  of  the 

X 

peak  of  gio.  It  was  shown  that  the  detector  using  this  nonlinearity 
achieves  near  locally  optimum  detection.  The  nonlinearity  decays 
at  the  same  asymptotic  rate  as  g,0  [1],  as  is  shown  in  Figure  2. 
Other  approximations  to  the  computationally  complex  gL0  have 
been  suggested  in  [5,  6,  7], 


Fig.  3.  Location  of  the  apex  of  ginix)  for  varying  a,  as  well  as  the 
fitted  line  of  best  fit. 


where 

Slor(0  =  Eh  [pLo  (lA'I^))]  , 

ri  is  the  rank  of  |X; |  in  the  set  [  |Ai|,  |X2|, . . . ,  |Xn|  ]  and  |Ar|(i) 
is  the  set’s  *th  smallest  member.  Eh  [■]  denotes  the  expectation 
operation  under  the  hypothesis  H.  Asymptotically  as  n  — >  oo,  the 
LOR  and  LO  detectors  become  equivalent. 

The  LOR  nonlinearities.  pL0R,  for  a  number  of  SaS  distribu¬ 
tions  have  been  approximated  numerically  [3,  4]  and  are  shown  in 
Figure  4.  In  contrast  to  gw  which  has  a  slow  rate  of  decay  and 
infinite  span,  gWR  need  only  be  evaluated  at  n  points.  The  results 
may  be  found  once  to  a  high  degree  of  accuracy  off-line  and  stored 
for  on-line  detection. 


Fig.  2.  Locally  Optimum  and  Locally  Suboptimum  nonlinearities 
for  a  =  1.6. 

When  the  pLso-P  nonlinearity  was  introduced,  the  location  of 
the  apex.  A,  was  found  by  numerically  locating  the  peak  of  the 
Plo  nonlinearity.  No  expressions  exist  for  finding  A  for  a  given  a. 
However,  when  the  location  of  the  apex  of  pL0(x)  is  plotted  against 
a ,  the  relationship  appears  very  linear.  This  is  shown  in  Figure  3 
along  with  the  line  of  best  fit.  The  equation  of  the  line  is 

A  «  2.73  q  -  1.75 

Note  that  as  a  — >  2,  gi.o{x)  becomes  a  straight  line  and,  therefore, 
the  approximation  breaks  down. 

With  the  aid  of  this  relationship,  it  is  now  feasible  to  construct 
a  LSO  detector  using  the  LSO-power  nonlinearity  without  a  heavy 
computational  burden. 

3.  LOCALLY  OPTIMUM  RANK  DETECTORS 

It  has  long  been  accepted  that  by  using  a  weak  set  of  assump¬ 
tions,  rank-based  tests  can  achieve  robust  performance  while  often 
only  suffering  slight  losses  in  efficiency  against  parametric  tests 
[8],  The  locally  optimum  rank  detectors  (LOR)  when  W  is  sym¬ 
metrically  distributed  uses  the  following  test  statistic  [9] 

n 

T-lor(X)  =  Si  sgn(A;)  pLoR(ri)  (3) 

1  =  1 


Fig.  4.  Locally  Optimum  Rank  score  functions  for  various  SaS 
distributions  when  n  —  100. 

If  off-line  estimation  of  pL0 r  is  not  possible,  then  an  appro¬ 
priate  locally  suboptimal  rank  (LSOR)  nonlinearity  is  a  triangular 
score  function  [3]  where,  again,  the  location  of  the  apex  varies 
with  a.  These  values,  normalised  by  the  sequence  length  n,  are 
shown  in  Figure  5  with  the  line  of  best  fit  having  the  equation 
0.5962  a  -  0.0873. 

An  interesting  special  case  is  when  a  —  1,  i.e.  the  Cauchy 
distribution.  It  has  already  been  noted  that  an  optimal  Cauchy  re- 
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Fig.  5.  Location  of  the  apex,  divided  by  the  sequence  length  n,  of  F'g-  6-  The  LOR  score  function  and  a  quadratic  of  best  fit,  for  the 
Plor(s)  for  varying  a ,  as  well  as  the  line  of  best  fit.  Cauchy  distribution  (ft  —  1 ). 


ceiver  performs  very  well  for  a  range  of  values  of  o,  but  particu¬ 
larly  when  it  is  close  to  1  [7].  In  Figure  6  it  can  be  seen  that  gL OR 
for  a  =  1  is  very  well  fitted  by  a  quadratic  score  function 

g(i)  =  -4.05  +  4.09  ^  -  5.29  x  10'2  . 

It  should  be  remembered  that  scaling  a  score  function  does  not 
affect  the  corresponding  detector’s  performance.  Therefore,  any 
centred  quadratic  function  would  suffice. 

4.  DISTRIBUTION  OF  TEST  STATISTICS 

Although  the  LO,  LSO,  LOR  and  LSOR  detectors  all  use  differ¬ 
ent  score  functions,  the  similarity  in  their  structure  as  correlators 
means  the  distribution  of  their  test  statistics  are  very  similar. 

4.1.  LO  and  LSO  Detectors 

Recall  that  the  LO  and  LSO  detector  statistics  have  the  form 

n 

T(X)  =  £><?(. Xi) 

i  —  1 

that  is,  the  test  statistic  is  the  sum  of  independent  random  variables, 
assuming  the  A;  are  iid  and  s,  is  some  bounded,  known  sequence. 
Under  H,  the  summed  variables  have  similar  distributions,  differ¬ 
ing  only  in  the  non-constant  scale,  s;.  If  g(X)  has  finite  variance, 
the  Central  Limit  Theorem  may  be  invoked,  meaning  T(X)  is 
asymptotically  Gaussian. 

Since  only  symmetric  aS  distributions  are  considered  here  then 
under  H,  E  [ <y ( A" ) ]  =  0.  Consequently, 


a  <  2  and  as  |r|  -+  oo.  as  given  by  [1]  is  used 


f{x)  = 


where  b = 


This  series  may  be  approximated  by  its  first  term  as  this  is  the  term 
with  the  slowest  rate  of  decay  for  |x|  — t  oo. 

'<*>  *  kPT  ’ 

To  determine  if  var[<?,.0(A')]  <  oo,  consider  the  integral 

/  =  J  9lo(*)f(z)dx 

and  its  approximation  using  its  highest  order  term  as  x  — *  oo 
(cv  4-  l)2  hi 


f(a  +  I)2  fei 
J  x2  r1+H 


dx 


(4) 


(a  +  l)2  hi 

(q  +  l)2 

o  +  2 


Sx~'~"dr 


b  i  x 


The  highest  order  term  of  I  does  not  diverge  to  ±oo  as  x  -+  ±oo, 
and  therefore,  neither  will  I.  Furthermore,  both  <yLrj (.x)  and  f{x) 
are  bounded  functions,  therefore  it  can  be  concluded  that  evalua¬ 
tion  of  the  integral,  I,  between  — oo  and  +oo,  that  is,  the  variance 
of  pi.o(A),  is  finite. 

Any  further  terms  taken  from  the  asymptotic  expansion  of  the 
pdf  in  (4)  will  have  faster  rates  of  decay,  therefore  it  can  be  taken 
that  the  variance  of  gL o(-Y)  and  gLSO-P(A)  are  finite  and  the  corre¬ 
sponding  test  statistics  are  asymptotically  Gaussian  (see  Figure  7). 


E[T(X)]  =  0 

n 

var[T(X)]  =  var[.9(A')]  x  ^  s2  . 

i  =  \ 

To  determine  if  g( X)  has  finite  variance,  an  asymptotic  ex¬ 
pansion  of  the  pdf  of  a  standardised  SaS  random  variable  when 


4.2.  LOR  and  LSOR  Detectors 

Now  consider  the  general  form  of  the  rank-based  detector  statistics 
considered  here 

n 

Tr(X)  =  Y'  st  sgn(A'i)  pR(r,  ) 

i=  1 
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Normal  Probability  Plot 


Fig.  7.  Normal  Probability  Plot  of  500  TW(X)  statistics  calculated 
for  n  =  100  observations,  a  =  1.6  and  s;  =  1. 

While  sgn(AY)  is  independent  of  sgn(.Yj),  i  f  j,  the  same  can¬ 
not  be  said  of  the  ranks.  If  the  possibility  of  ties  is  neglected,  each 
r;  is  an  integer  between  1  and  n  and  each  rank  integer  occurs  only 
once,  that  is  r*  f  r,  if  i  /  j.  Then,  clearly,  the  ranking  procedure 
introduces  dependence  between  the  terms.  However,  as  n  — >  oo, 
this  dependence  becomes  negligible,  and  therefore  asymptotically 
these  terms  are  independent.  Again,  the  Central  Limit  Theorem 
can  be  used  to  assert  the  asymptotic  Gaussianity  of  TR(X).  This 
is  confirmed  experimentally  by  the  Normal  Probability  Plot  in  Fig¬ 
ure  8. 


Normal  Probability  Plot 


Fig.  8.  Normal  Probability  Plot  of  500  TLQR(X)  statistics  calcu¬ 
lated  for  n  =  100  observations,  a  =  1.6  and  st  =  1. 

If  |<?R(ri)|  <  oo  for  all  i  =  1, . . . ,  n,  then 

=  -^Sr(*)  and 

n 

i~  1 

E[alW]  = 

i=  1 

will  be  finite  for  finite  n. 

The  distribution  of  the  test  statistic  TR(X)  under  H  is  indepen¬ 
dent  of  the  distribution  of  X,  provided  it  is  symmetric.  While  its 


exact  distribution  may  be  calculated  for  any  gR  and  s,  in  practice, 
this  is  tedious  and  a  suitably  accurate  approximation  can  be  made 
using  the  Gaussian  distribution  with 

E[Tr(X)}  =  0 

var[TR(X)]  =  ^  f>2(i)  f>20')  . 

*= i  l=i 

Further  discussion  on  the  distribution  of  rank-based  detection  statis¬ 
tics  can  be  found  in  [9], 

5.  CONCLUSIONS 

Previous  contributions  have  shown  that  approximations  to  LO  and 
LOR  detection  can  be  achieved  through  other  score  functions  that 
are  more  readily  implementable.  This  concept  has  been  extended 
here  by  finding  an  approximate  linear  relationship  between  the  re¬ 
spective  apices  of  gLO  and  pL0R,  and  a.  Investigation  of  the  distri¬ 
bution  of  the  detection  test  statistics  has  also  shown  that,  for  finite 
sample  size,  they  are  approximately  Gaussian.  As  a  result  of  these 
findings,  practical  implementation  of  these  detectors  is  easier. 
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ABSTRACT 

Zhao  and  Rao  have  proposed  linear  scale-invariant  systems 
that  operate  with  continuous  dilation  but  in  discrete-time.  This 
was  done  through  a  discrete-time  continuous-dilation  operator 
which  tacitly  uses  warping  transforms  such  as  bilinear  transforms 
to  implement  conversion  from  discrete  time  frequency  to 
continuous  time  frequency.  This  paper  introduces  a  more  general 
method  based  on  kernels  for  effecting  the  dilation.  It  is  shown  that 
die  warping  function  based  scaling  is  a  special  case.  The  kernel 
approach  results  in  an  alternative  formulation  of  discrete-time 
linear  scale-invariant  systems  that  possesses  desirable  properties 
not  seen  in  the  earlier  formulation. 

1.  INTRODUCTION 

The  previous  work  of  Zhao  and  Rao  [10],  [17]-[20]  has 
shown  that  it  is  possible  to  formulate  continuous  dilation  Linear 
Scale  Invariant  (DLSI)  systems  in  discrete-time.  The  basis  for 
their  formulation  is  provided  by  a  definition  of  scaling  or  dilation 
in  discrete-time  using  warping  and  unwarping  functions.  Our 
subsequent  work  investigating  self-similarity  properties  of  signals 
generated  by  these  systems  with  white  noise  inputs  has  produced 
results  related  to  their  suitability  for  synthesizing  data  with  desired 
self-similarity  parameters  [4],  A  motivation  for  studying  self¬ 
similar  signals  has  been  provided  by  the  seminal  work  of  Leland 
et  al  [1]  showing  that  Ethernet  traffic  is  self-similar.  Self¬ 
similarity  has  since  been  found  in  other  types  of  network  traffic 
including  wireless  networks  [8],  [11].  Self-similar  traffic  gives 
rise  to  buffering  requirements  that  are  different  and  usually  higher 
from  those  predicted  by  Poisson  assumptions  [7].  Much  of  the 
theoretical  foundation  related  to  the  characterization  of  statistical 
self-similarity  was  laid  by  Mandelbrot  and  Van  Ness  [6]  in  the 
context  of  describing  fractional  Brownian  motion  (fBm)  and 
fractional  noise.  For  simulating  data  such  as.  for  example,  network 
traffic  we  clearly  require  synthesis  of  discrete-time  self-similar 
random  processes.  Several  methods  have  been  proposed  for 
generating  discrete-time  self-similar  signals  [1].  [2].  [3],  [8].  Our 
prior  work  has  demonstrated  that  synthesis  of  self-similar  signals 
using  white  noise  inputs  to  our  discrete-time  LSI  models  produces 


data  whose  properties  are  consistent  with  that  of  network  traffic. 

The  paper  is  organized  as  follows.  Section  2  provides  an 
overview  of  our  earlier  formulation  of  DLSI  systems.  The  new 
kemel-based  discrete-time  continuous-dilation  operator  of  the 
paper  is  introduced  in  Section  3.  Section  4  describes  the  DLSI 
systems  based  on  the  new  dilation  operator.  Concluding  remarks 
are  made  in  Section  5. 

2.  OVERVIEW  OF  ZHAO  AND  RAO'S 
DLSI  SYSTEMS 

2.1  Time-Scaling 

Die  definition  of  self-similarity  rests  on  the  operation  of 
time  scaling  or  dilation.  Whereas  it  is  possible  to  dilate  a 
continuous-time  signal  in  a  continuous  fashion,  the  same  cannot 
be  done  with  discrete-time  signals.  To  avoid  this  difficulty,  Zhao 
and  Rao  [10],  [17]-[20]  define  a  scaling  operator  for  discrete-time 
signals  that  can  work  with  any  real-valued  scaling  factor  greater 
then  zero  based  on  a  warping  transform  J[w)  which  transforms  a 
discrete-time  frequency  (w)  to  continuous-time  frequency  (£2). 
Die  inverse  transform JT\  )  defines  the  continuous-time  frequency 
to  discrete-time  frequency  or  unwarping  transform.  One  examples 
of  the  warping  transform  is  bilinear  transform  (BLT) 

£2  =  /(<y)  =  2tan(<y/2) .  (1) 

Using  the  warping  transform  defined  above  and  time- 
frequency  scaling  property  of  the  continuous  time  Fourier 
transform,  the  scaling  operator  .S'„[  ]  of  discrete-time  sequence  x(n) 
is  defined  by 

v(n)  =  Sa  [*(«)]  =  aG'  {A'fA^fry)]}  (2) 

where  y(n)  is  the  output  of  the  operator.  G  '  is  the  discrete-time 
Fourier  transform  (DTFT),  A „(co)  =  .  Die  scaling 

operator  is  shown  in  Figure  1 . 

For  a  stochastic  input  sequence,  if  the  input  X(n)  of  the 
discrete-time  scaling  operator  .S’„[  ]  is  a  discrete-time  wide-sense 
stationary  random  process  with  power  spectral  density  P.fcd).  it 
was  shown  the  output  is  also  wide-sense  stationary  with  power 
spectral  density  given  by 


Figure  1.  Block  diagram  of  the  discrete-time  scaling  function 
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(«)„■! 


(13) 


Pr(co)  = 


a-Px  [A.(a>>] 
I  A'.(®)| 


(3) 


where  A  ’a(w)  is  the  first  derivative  of  A a(co)  with  respect  to  co. 


2.2  Discrete-Time  Self-Similarity 


Using  the  discrete-time  continuous-dilation  scaling  operator 
Sa[  ]  in  (2),  discrete-time  stochastic  self-similar  signals  can  be 
defined  as  follows:  a  discrete-time  random  signal  X{n)  is  said  to 
be  self-similar  with  degree  H  in  the  wide-sense  if  it  satisfies  the 
following  equations 

E[Sa[X{n)]\  =  aHE[X{n)\  (4) 


and 

Saa  («.«')]  =  a2"  Ra.(n,n')  (5) 


for  any  a  >  0,  where  Rx\{n,  n  ’)  is  the  autocorrelation  function  of 
the  sequence  X(n).  For  a  discrete-time  wide-sense  stationary 
random  process,  the  condition  of  self-similarity  simply  reduces  to 


£  kw] 
[a ’„(©)] 


:  a2H-2PxUo) 


(6) 


where  P_\{ai)  is  the  power  spectral  density  of  the  signal.  Therefore, 
a  stationary  random  process  X(n)  whose  power  spectral  density 
satisfies  (6)  is  a  self-similar  signal  in  the  statistical  sense.  Zhao 
and  Rao  suggested  the  next  power  spectrum  for  the  density. 


Px{<0)  = 


\n<o)\ 

\fXa)\ 


(7) 


where f’(co)  is  the  first  derivative  of f  with  respect  to  co. 
From  (6),  (7)  and  Kfa>)  =  /“'  \af((o)\ , 


^[A.(a>)] 

[A’.(®>] 


d-'Px{co). 


(8) 


Thus,  X{ri)  is  a  self-similar  random  process  with  H=  -(r  +  l)/2. 

If  the  power  spectral  density  ftr(w)  satisfies  the  Paley- 
Wiener  condition,  the  density  can  be  factorized  as  a  product 
L{co)  L*{co)  and  by  passing  white  noise  through  a  linear  system 


with  frequency  response  L(co),  the  corresponding  stochastic  self¬ 
similar  process  can  be  generated. 

The  power  spectral  density  for  the  BLT  is 


Px(®)  = 


Vicoj 

\f'(co)\ 


=  2r 


l-cos2(m/2) 

cos2(to/2) 


cos2(<u/2) 


(9) 


and  this  was  known  to  satisfy  the  Paiey-Wiener  condition. 
Let  z  =  d01,  then  Px{  co)  transforms  to 


Px(z)  =  L{z)L{z~')  (10) 


and 


(«),.  =  u(u  +  \\u  +  2)---(u  +  v-1)  =  +  ^  (14) 

T(«) 

The  impulse  response  corresponding  to  L2(z)  is  a  2-tap  filter  with 
coefficients  given  by 

h  (0)  =  h  (1)  =  2r'2'1  (15) 

The  overall  impulse  response  I(n)  corresponding  to  the 
system  transfer  function  given  in  (11)  can  be  represented  by  two 
cascaded  filters  i](n)  and  /,(«). 

2.3  LSI  System 

A  linear  scale-invariant  (LSI)  system  is  a  linear  operator 
L{}  whose  output  is  invariant  to  scale  changes  of  the  input 
signals,  that  is, 

>’(«)  =  7-{-v(«)}  =>  Sc  [y(n)]  =  L{S„  [*(«)]}  (16) 

where  x(n)  and  y(n)  are  the  input  and  output  sequence 
respectively. 

A  discrete-time  causal  LSI  system  for  a  given  x(n)  can  be 
defined  similar  to  the  continuous-time  case  [14],  Let  h(k)  be  any 
one-dimensional  discrete-time  sequence.  The  discrete-time  causal 
LSI  system  is  defined  by  the  following  relationship  : 

y(n)  =  Ydh(k)St[x(n)]/k  (17) 

*=1 

The  output  of  the  system  is  the  sum  of  a  series  of  dilation  of  the 
input  sequence  by  k  that  are  linearly  weighted  by  h(k)/k. 

If  the  input  of  the  LSI  system  is  a  discrete-time  stochastic 
self-similar  signal  with  degree  H,  then  the  output  is  also  a 
stochastic,  self-similar  signal  with  degree  H  [17]-[19].  In  addition, 
if  the  input  to  a  discrete-time  LSI  system  is  a  discrete-time  wide- 
sense  stationary  random  process,  the  output  of  the  system  is  non¬ 
stationary  due  to  the  fact  that  the  system  is  time-varying.  Using 
this  property,  a  non-stationary  self-similar  random  signal  with 
parameter  H  =  -{r  +  1)  /  2  can  be  generated  by  first  generating  a 
discrete-time  self-similar  random  process  with  degree  H  by 
passing  zero-mean  white  noise  through  a  linear  system  with  a 
frequency  response  given  by  (1 1),  and  then  passing  the  signal  thus 
obtained  through  a  discrete-time  LSI  system.  Note  that  the  choice 
of  the  one  dimensional  function  h(k)  in  the  discrete-time  LSI 
system  is  arbitrary.  This  provides  flexibility  in  signal  construction. 
h(k)  can  be  chosen  so  that  the  output  of  the  system  has  certain 
properties  as  desired. 

3.  KERNEL  REPRESENTATION  FOR 
DISCRETE-TIME  CONTINUOUS- 
DILATION  SCALING 


where  the  causal  part  L(z)  is 

L(z)  =  2"2-1  (1  -  z"1  y12  (1  +  z"’  )'-r'2  (11) 

Note  that  the  spectrum  is  rational  only  for  integer  value  of  r. 

The  corresponding  impulse  response  of  is  a  causal  filter 
whose  coefficients  are  given  by 

fl  n  =  0 


(-l)-(r/2)J 

k=0 


(r/2-  k  +  !)„_, 
k !(«  -  k)\ 


n  >  0 


(12) 


where  (  )„  is  the  Pochammer ’s  symbol  defined  as 


Let  p(n,t)  and  s(n,t)  be  linear  operator  kernels  effecting 
transformation  of  signals  between  the  discrete-time  and 
continuous-time  domains  as 


x(t)=  Z  x(n)p{n,t) 

(18) 

x(n)=J”  x(t)s(n,t)dt 

The  warping  defined  in  the  previous  section  is  a  special  case  with 
p(mt)  =  £  exp[./ (LF  -  f~'  (£2)n)]rff2  (19) 
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and 


s(«,/)=^  J^exp[j(«»-/(m)/)]d« 

(20) 

It  follows  that 

p(k,t)s(n.t)dt  =  S(n-k) 

(21) 

and 

Z  s(n.r)  p(n.t)dT  =  8(t-  r) 

(22) 

M=~« 


where  8  stands  for  the  discrete  and  continuous  impulse  functions 
depending  on  its  context.  The  discrete-time  scaling  operation  is 
now  defined  using  the  relations  in  Equation  (18).  Let  xa(n)  denote 
the  time-scaling  of  a  discrete-time  signal  x(n)  by  a  factor  a.  Let 
x(t)  be  the  continuous-time  equivalent  ofx(n)  obtained  through  the 
transformation  in  Equation  (18).  We  define 


x„(n)=J~  x(t / a)s(n,l)dl 


-L 


Z  x(k)p(k.t  I  a) 


;(n.t)dt 


=  Z  x(k)  £.  p(k.tla)s(n.t)dt 


(23) 


Thus,  with 

g„(k,n)=  J~  p(k.t/a)s(n.t)dt  (24) 

the  scaling  operator  defined  by  the  kernel  is 

*«(")  =  A.  Mw)}=  Z  *(*)&,  (M  (25) 

*=-~ 

When  a  =  1,  the  relationship  becomes 

x(n)=  Z  *(k)  £'P(k.t)s(n.t)df  =  Z  x(k)S(k  - n)  (26) 

k=-*’ 

One  of  the  key  properties  of  the  scaling  operator  Do  is  invertibility. 
that  is  Da followed  by  yields  the  original  discrete-time  signal. 


4.  KERNEL  BASED  DLSI  SYSTEMS 


We  now  define  scale-invariance  for  a  discrete-time  system 
as  before  (see  Equation  (16))  except  that  it  must  hold  for  the 
operator  D„  .  We  will  show  that  it  is  possible  to  effect  DLSI 
systems  simply  by  effecting  discrete-time  to  continuous-time 
transformation  using  the  p(n,t)  kernel,  implementing  a  time- 
varying  convolution  corresponding  to  a  continuous-time  scale 
invariant  system  and  then  transforming  the  result  to  discrete-time 
using  the  s(n,t)  kernel. 

Thus,  given  an  input  sequence  x(n)  and  a  DLSI  system 
characterized  by  a  sequence  h(n),  we  first  form 


x(0  =  'Zlx(n)P(n-f) 


h(t)  =  '£jh(n)p(n,t) 


(27) 


y(0=  \h(r)x(t  I  r)~- 

=  l'Z>’{l)p(!.T)'£x{n,)p(nU/T)!~- 

r  I  «  T 

The  output  y(n)  of  the  DLSI  system  is  obtained  as 
y(«)=  Jv(a)s(«  .a)  da 

a 

=  j  JZ/,(/M/’r)Z*("')H"!-,/r)—  s(n,a)da 

aLr  '  1  . 


=ZZ/'(/)^(«) 


dr 


\\p{l.r)p{m.tlr)s(>ua)~ 


da 


=ZZ/'(/)-v("'R>) 


(28) 


(29) 


where 

k,  „,(«)=  ^p{Ur)  p{mj  I  r)s(n,a)—da  (30) 

Contrast  this  result  with  the  expression  in  Equation  (17).  Unlike, 
the  previous  expression  Equation  (29)  preserves  symmetry 
between  x(n)  and  h/n)  much  like  the  continuous-time  LTI  systems 
of  Womell. 

There  is  another  attractive  property.  Let 


!„(«)={  t"s(n.t)dl . 

(31) 

With  tjn) 

as  input  to  the  DLSI  system,  we  find  the  output  is 

y(n)  =  H(a)ta{n) 

(32) 

where 

H(a)  =  ^h(l)P,(a) 

1 

(33) 

with 

P,  («)= 

(34) 

Thus  1a(n)  is  an 
Suppose 

eigenfunction  of  the  DLSI  system. 

*(«)=Z 

i 

(35) 

and 

^(«)=Z-v(0^(«) 

/ 

(36) 

Then 

Y(a)  =  H(a)X(a) 

(37) 

We  thus  have  the  beginnings  of  a  scale-domain  transform 
operator  similar  to  the  Fourier  transform  for  linear  time-invariant 

systems. 


We  then  form 
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9. 


Figure  2.  Block  diagram  of  the  kernel  based  the  DLSI  system 


5.  CONCLUSION 

The  discrete-time  LSI  systems  proposed  previously  by  Zhao 
and  Rao  provide  a  potential  tool  for  the  analysis  and  simulation  of 
natural  self-similar  signals  because  of  their  scale  invariant 
property  (even  though  they  are  time-varying  in  general)  in 
continuous  scale.  This  earlier  approach  was  based  on  a  tacit 
warping  in  the  frequency  domain.  The  paper  has  provided  an 
alternative  approach  based  on  linear  kernels  for  transforming 
discrete-time  signals  to  continuous  time  and  vice  versa.  The 
resulting  formulation  of  DLSI  systems  has  several  attractive 
properties.  We  have  shown  that  such  systems  may  he  amenable  to 
analysis  using  scale-domain  transforms  that  are  analogous  to 
Fourier  transforms.  We  believe  the  discrete-time  LSI  system 
formulation  occupies  a  place  in  the  study  of  scale-invariance  and 
self-similarity  that  corresponds  to  the  position  of  linear  discrete¬ 
time  time-invariant  systems  in  the  study  of  stationary  random 
processes. 
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ABSTRACT 

We  define  and  study  stochastic  discrete  scale  invariance  (DSI ),  a 
property  which  requires  invariance  by  dilation  for  certain  preferred 
scaling  factors  only.  We  prove  that  the  Lamperti  transformation, 
known  to  map  self-similar  processes  to  stationary  processes,  is  an 
important  tool  to  study  these  processes  and  gives  a  more  general 
connection:  in  particular  between  DSI  and  cyclostationarity.  Some 
general  properties  of  DSI  processes  are  given.  Examples  of  ran¬ 
dom  sequences  with  DSI  are  then  constructed  and  illustrated.  We 
address  finally  the  problem  of  analysis  of  DSI  processes,  first  using 
the  inverse  Lamperti  transformation  to  analyse  DSI  processes  by 
means  of  cyclostationary  methods.  Second  we  propose  to  re-write 
these  tools  directly  in  a  Mellin  formalism. 

1.  DISCRETE  SCALE  INVARIANCE 

Scale  invariance,  also  called  self-similarity,  is  frequently 
called  upon.  Its  central  point  is  that  the  signal  is  scale  in¬ 
variant  if  it  is  equivalent  to  any  of  its  rescaled  versions,  up 
to  some  amplitude  renormalization  [1].  More  precisely,  a 
function  X(t)  is  scale-invariant  with  exponent  H ,  or  H- ss, 
if  for  any  k  e  K:  X(kt)  =  kHX(t,). 

This  definition  is  given  here  for  a  deterministic  signal. 
The  concept  can  be  extended  to  stochastic  signals  when  one 
thinks  of  the  previous  equality  in  a  probabilistic  way:  the 
equality  of  the  finite-dimensional  probability  distributions 

[!]•  We  will  write  =  this  equality. 

The  strict  notion  of  scale  invariance,  valid  for  all  dila¬ 
tion  factors  above,  is  in  some  cases  too  rigid;  the  middle- 
third  Cantor  set  is  for  example  invariant  only  by  dilations 
of  a  factor  3  (or  a  power  of  3).  Several  weakened  versions 
of  self-similarity  have  been  proposed  to  enlarge  scale  in¬ 
variance’s  relevance  and  one  is  of  special  interest  here:  it  is 
to  require  invariance  by  dilation  for  certain  preferred  scal¬ 
ing  factors  only,  as  it  is  the  case  for  the  Cantor  set.  This  is 
known  as  discrete  scale  invariance  (DSI),  a  concept  which 
as  been  stressed  upon  by  Somette  and  Saleur  [2,  3]  as  an 
efficient  model  in  many  situations  (fracture,  DLA,  critical 
phenomena,  earthquakes). 


They  studied  DSI  as  a  property  of  deterministic  signals, 
and  provided  general  arguments  as  why  should  DSI  nat¬ 
urally  occur:  classical  scenarii  involve  the  existence  of  a 
characteristic  scale,  the  apparition  by  instability  of  a  pre¬ 
ferred  scale  or  more  general  arguments  in  non-unitary  field 
theories  [4].  They  also  found  ways  to  estimate  the  preferred 
scaling  ratio  in  this  context,  based  on  classical  spectral  anal¬ 
ysis  (Lornb  periodogram). 

As  far  as  we  know,  this  property  has  not  been  envisioned 
for  stochastic  processes,  a  framework  which  is  often  fruit¬ 
ful  to  dispose  of  when  dealing  with  real  measurements,  as  it 
allows  to  use  statistical  signal  processing  methods.  The  ex¬ 
tension  of  DSI  property  to  stochastic  processes  is  straigth- 
forward.  We  propose  the  following  definition. 

A  process  {X(t),  t  C  R+  }  has  discrete  scale  invari¬ 
ance  with  scaling  exponent  H  and  scale  A  if 

X(\t)  =  XHX{t),t£R+.  (1) 

We  will  refer  to  this  property  as  (i/,  A)-DSI.  The  equal¬ 
ity  here  is  the  probabilistic  equality.  In  the  following  only 
wide-sense  property  will  be  used  (second-order  statistical 
properties  only). 

2.  LAMPERTI  TRANSFORM  :  DSI  AS  AN  IMAGE 
OF  CYCLOSTATIONARITY 

2.1.  Lamperti  transformation 

A  main  issue  is  to  find  a  way  to  study  both  theoretically  and 
practically  DSI  processes.  The  answer  is  given  by  a  trans¬ 
formation  introduced  by  J.  Lamperti  in  1962  [5],  which  is 
an  isometry  between  self-similar  and  stationary  processes. 
It  will  be  called  the  Lamperti  transformation  and  is  defined 
as  follows. 

For  any  process  {!'(«),  t.  6  M},  its  Lamperti  transform 
{A'(t),  t  €  M+  }  and  its  inverse  are  given  by 

X(t)  =  (CY)(t)  =  tHY(\n  t),  te  E+;  (2) 
F(i)  =  (£_1A')(i)  =  e~HtX (e'),  t  €  M.  (3) 
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The  theorem  in  the  paper  of  Lamperti  is  that  a  process 
Y(t)  is  stationary  if  and  only  if  its  Lamperti  transform  .Y  = 
CY  is  H-ss.  The  central  argument  of  the  derivation  is  that 
the  Lamperti  transformation  maps  a  time-shifted  process  to 
the  dilated  version  of  the  Lamperti  transform  of  the  original 
process.  Let  (V^X)  (t)=\~H X(\t)  be  the  dilation  oper¬ 
ator  and  (<SrY)  (t)=Y(t  +  r)  the  time-shift  operator.  The 
property  is  that 

CY)(t)±  ( SlnXY)(t ).  (4) 

Understanding  this  correspondence  between  time-shift 
and  dilation  operators,  we  can  propose  many  variations  around 
Lamperti’s  theorem,  relaxing  in  some  way  the  stationarity 
for  Y  and  the  self-similarity  for  X.  We  will  only  con¬ 
sider  here  the  DSI  property  but  some  results  about  different 
classes  of  processes  and  their  description  are  proposed  in 
[6].  A  useful  property  is  that  one  can  give  the  (potentially 
nonstationary)  correlation  function  of  the  Lamperti  trans¬ 
form  X  of  a  process  Y: 

E{X(t)X(s)}=  Rx(t,s)  =  (st)HRy  (Inf,  Ins).  (5) 

In  the  recent  years  some  results  have  been  obtained  for 
H-ss  processes  with  this  transformation.  Yazici  and  Kashyap 
proposed  a  general  description  of  wide-sense  self-similar 
processes  and  linear  models  for  H-ss  [7].  Bumecki  et  al. 
study  a-stable  and  H-ss  processes  with  this  transform  [8], 
Nuzman  and  Poor  give  important  results  about  the  predic¬ 
tion,  the  whitening  and  the  interpolation  of  H-ss  processes, 
mainly  applied  to  the  fractional  Brownian  motion  [9],  Fi¬ 
nally  Vidacs  and  Virtamo  [  10]  proposed  a  method  of  esti¬ 
mation  of  H  for  a  fBm,  based  on  the  same  idea.  All  these 
authors  use  the  inverse  Lamperti  transformation  (3)  to  map 
the  question  to  a  stationary  problem  and  then  use  the  known 
results  for  stationary  issues  in  this  context.  Our  objective  is 
to  show  that  nonstationary  methods  can  be  adapted  in  the 
same  way,  especially  for  DSI. 

2.2.  DSI  and  cyclostationarity 

A  process  is  called  cyclostationary  [1 1]  or  periodically-cor¬ 
related  [12,  13],  if  its  correlation  function  is  periodic  in 
time.  More  precisely,  if  a  period  T  is  given,  a  process 
{Y(t),  i  £  1}  is  wide-sense  cyclostationary  if  it  satisfies 
for  any  times  t,  s 

EY(t  +  T)  =  E  Y{t), 

E{Y(f  +  T)Y(s  +  T)}  =  E{Y(t)Y(s)},  (6) 

The  correlation  function  Ry(t,  t  +  t)  is  then  periodic  in  t 
of  period  T  and  one  can  decompose  Ry  in  a  Fourier  series 

H-OO 

Ry(t,t  +  r)  =  Cn(r)ei2™t/T .  (7) 

n=— oo 


Using  the  definitions  of  cyclostationarity  and  (H,  X)- 
DSI  and  the  correspondance  (4),  we  can  state  the  following 
important  result. 

A  process  {Y(i),  t  6  R}  is  cyclostationary  of  period 
T  if  and  only  if  its  Lamperti  transform  of  parameter  H: 

|X(f)  =  tHY(lnt),  t  g  ®+},  is  (H,eT)- DSI. 

This  is  one  possible  extension  of  Lamperti's  theorem, 
one  of  importance  in  our  study  of  DSI.  A  first  consequence, 
using  (5),  is  that  the  general  form  of  covariance  of  (H,  A)- 
DSI  processes  is  naturally  expressed  on  a  Mellin  basis: 

4-oo 

Rx(t,kt)  =  kHt2H  Y,  C„(fc)f27rn/ln\  (8) 

n=— oo 

Note  that  if  the  process  X  is  real-valued,  a  necessary 
condition  is  imposed:  U_„(/c)  =  C*(k).  The  Mellin  func¬ 
tion  tH+t2nn/ 1,1  A  in  (8)  is  central  in  the  study  of  DSI  pro¬ 
cesses.  This  is  not  a  surprise:  Lamperti  transformation  maps 
the  Fourier  basis  (invariant  up  to  a  phase  by  time-shift)  to 
the  Mellin  basis  (invariant  up  to  a  phase  by  dilation  and  hav¬ 
ing  also  the  deterministic  DSI  property).  We  stress  the  fact 
the  Mellin  functions  are  a  basis  and  that  they  have  an  asso¬ 
ciated  transformation  which  can  be  numerically  computed 
[14]. 

3.  EXAMPLES  OF  PROCESSES  AND  SEQUENCES 
WITH  DSI 

Continuous-time  systems  with  DSI  property  are  easily  con¬ 
structed.  Applying  £  to  an  ARMA(p,<?)  system,  we  obtain 
a  generalization  of  the  Euler-Cauchy  (EC)  system.  It  is  a 
model  for  self-similar  processes  [7],  driven  by  a  multiplica¬ 
tive  Gaussian  noise  77(f),  whose  correlationis  E  {r)(t)r](s)}— 
tcr2S(t  -  s).  The  process  X(t)  verifies 

P  J n  9  jm 

E  b~tnW‘x{t)  =  £  m 

n= 0  m— 0 

In  the  same  manner  that  a  nonstationary  ARMA  model  with 
periodic  time- varying  coefficients  is  cyclostationary  [15], 
one  obtains  a  DSI  model  when  taking  log-periodic  time- 
varying  coefficients  am  and  bn  in  the  (EC)  system.  This 
will  be  not  detailed  further. 

In  order  to  obtain  DSI  processes  in  discrete  time  (ran¬ 
dom  sequences  with  self-similarity  and  log-periodicity),  a 
possibility  is  to  consider  a  discrete-time  system  analog  to 
(EC)  (H-ss  in  a  certain  way),  then  introduce  log-periodicity 
in  the  coefficients.  We  describe  two  approaches  here. 

A  direct  discretization  in  time  of  the  (EC)  system  is 
given  by  the  integration  of  its  evolution  between  two  in¬ 
stants.  This  was  proposed  in  [16]  for  the  first  order.  This 
nonstationary  H-ss  system  is  written  as  Xu  =  a[k\Xk- 1  + 
efc,  where  a[fc]  ~  1  —  a/k  and  e*  is  a  time-decorrelated 
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Fig.  1.  Typical  realizations  of  DSI  random  sequences.  On  the  left 
the  model  is  a  (EC)  system  of  order  2  discretized,  in  cascade  with 
a  log-periodic  AR(1).  On  the  right  it  is  constructed  on  fractional 
difference  (see  text).  The  length  is  5000  points,  H  =  0.1,  A  = 
1.0.  The  oscillations  above  the  signals  are  indicative  of  the  log- 
periodicity  of  the  AR.  fo  =  1/8,  p  =  0.6,  ftr  =  0.25  and  ft;  —  0. 


Gaussian  noise  with  variance  E  e\  oc  k2H_i,  when  k  is 
large.  The  generalization  to  the  discretization  of  (EC)  of  or¬ 
der  n  is  straightforward.  The  result  is  of  the  form,  for  the 
large  times  k 

(1  -  B)nXk  +  a]fc_1(l  - 

=  k-2AR{n  -  l)AVi  +  e*  +  C(k ~3)  (10) 

where  B  is  the  backward  operator,  and  AR  is  an  AR  model. 

Such  a  system  with  log-periodicity  in  the  coefficient  oi 
and  in  the  AR,  or  in  cascade  with  a  log-periodic  AR  sys¬ 
tem  (see  for  example  the  AR(  1 )  proposed  hereafter,  equa¬ 
tion  12),  will  present  an  approximate  DSI  property.  The 
reader  can  see  on  the  left  of  figure  1  a  realization  of  such  a 
process. 

Another  class  of  discrete-time  self-similar  systems  is 
given  by  models  constructed  on  the  fractional  difference  op¬ 
erator.  The  usual  method  is  to  use  its  moving  average  repre¬ 
sentation  written  as  a  binomial  expansion.  We  prefer  to  use 
the  discretization  proposed  in  [17],  constructed  with  some 
generalization  of  the  bilinear  transformation  in  order  to  de¬ 
fine  a  scaling  operator  for  sequences.  The  fractional  differ¬ 
ence  operator  is  then  a  filter  li[n]  whose  impulse  reponse 
is 


(-l)AT(r  +  k)T(-r  +  n  -  k) 

1  71  ~  “  r(fc  +  l)r(n-/f  +  l)T(r)r(-r)' 


(ID 


4.  ANALYSIS  BY  DELAMPERTIZATION 

In  front  of  a  general  class  of  processes  (or  random  sequences 
in  the  context  of  numerical  processing)  which  are  nonsta¬ 
tionary,  or  of  unknown  structure,  one  has  to  find  methods  to 
analyse  those.  Given  a  sequence  A"„  suspected  of  DSI,  the 
simplest  way  of  analysis  is  to  find  the  presumed  cyclosta¬ 
tionary  process  associated  by  applying  . 

Generally  speaking,  classical  stationary  methods  are  use¬ 
ful  to  analyse  self-similar  process  after  “delampertization” 
of  the  signal.  This  was  the  essence  of  papers  on  H- ss  pro¬ 
cesses  cited  before  [7,  8,  9],  Nonstationary  methods  can 
then  be  used  tu  study  classes  of  processes  which  have  not 
proper  self-similarity,  but  which  have  some  kind  of  nonsta- 
tionarity  with  regards  to  dilation  -  a  nonstationarity  in  scale. 
DSI  is  then  only  a  first  interesting  example  of  a  precise  kind 
of  nonstationarity  in  scale. 

Before  using  cyclostationary  methods,  a  practical  prob¬ 
lem  must  be  considered  :  how  to  compute  in  discrete  time 
the  inverse  Lamperti  transformation  ?  First,  it  needs  a  non 
linear  sampling  t  —  q”  of  the  data  (but  such  is  not  often 
the  case  with  real  signals),  or  an  interpolation  to  find  the 
data  with  this  geometrical  sampling,  given  a  signal  A )  with 
usual  arithmetic  sampling:  the  corresponding  sequence  Yt 
is  known  for  t  =  In  n,  with  n  €  N  and  we  want  it  for 
t  =  m,m  £  Z.  Figure  2  shows  on  the  left  the  sequence  Y 
constructed  from  the  second  process  on  figure  1. 

A  second  difficulty  is  that  H  is  a  priori  unknown.  Using 
the  transformation  of  parameter  H  seems  tricky...  In  fact 
the  tools  used  thereafter  have  not  been  found  to  be  sensitive 
to  this  amplitude  effect.  The  cyclostationary  tools  are  found 
unaffected  if  one  uses  H  =  0.5  to  delampertize  the  process 
in  place  of  the  real  H. 

We  tried  the  applicability  of  these  ideas  on  synthetic  se¬ 
quences.  As  an  example  of  a  classical  cyclostationary  tool, 
we  implemented  the  methods  proposed  in  [  1 8].  In  a  nutshell 
the  algorithm  to  estimate  a  time-smoothed  cyclic  cross  pe- 
riodogram  is  as  follows.  First  the  signal  is  decomposed  in 
N  segments  of  length  L  in  order  to  average  on  these  parts. 
A  filtered  and  decomposed  version  is  computed,  where  h  is 
a  data  tapering  window: 

N/2 

Yr(nJ)  =  h(l)y(n  ~  l)e-i2nf{n-l)T*  (13) 

I— —N/2 


This  filter  is  in  cascade  with  a  nonstationary  AR  filter 
whose  coefficients  are  log-perodic.  For  example  we  may 
limit  ourselves  to  the  first  order  (coefficient  Z2),  taking  care 
that  the  filter  is  stable  at  each  instant: 

l2  =  (p  +  PT  COS^P)  ei2^o(l+/?/cos(2^))_  (12) 

We  propose  an  example  of  such  a  signal  fig.  1  on  the  right. 


Then  the  spectral  components  Yr(n,  •)  are  correlated  at 
frequencies  /  -  vc/2  and  /  +  vc/2  by  a  multiplier  followed 
by  a  low-pass  filter  g : 

Sy-  {v,  /)=£  Mn,  f  ~  jWfin ,  f  +  j)g(v  -  n). 

ri 

This  is  an  estimate  of  the  spectral  cross  correlation.  The 
usual  spectrum  is  distributed  on  the  main  diagonal  vc  =  0 
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Fig.  2.  On  the  left  is  shown  the  cyclostationary  sequence  after 
using  £_1  on  the  signal  plotted  on  the  right  of  fig.  1.  The  marginal 
in  cyclic  frequency  is  represented  on  the  right.  The  main  peak  on 
the  center  is  the  total  energy  of  the  signal.  The  two  symmetric 
peaks  (pointed  on  by  arrows)  are  an  indication  of  cyclostationarity 
and  situated  to  frequencies  ±2ir/  In  A. 


and  for  cyclostationary  sequences  it  presents  non-zero  dis¬ 
tributions  on  vc  =  ±1/T  (and  eventually  on  higher  har¬ 
monics).  The  marginal  in  cyclic  frequency  of  this  spectrum 
has  then  sharp  peaks  on  1  /T  where  A  =  eT  for  DSI  and 
gives  a  reliable  estimation  of  A.  See  figure  2  the  result  of 
this  procedure  for  the  synthetic  model  described  before. 


5.  TOWARD  MELLIN-BASED  TOOLS 

Another  way  of  thinking  might  be  fecund  to  analyse  DSI 
processes.  We  can  formulate  directly  the  methods  in  a  Mellin 
formalism,  with  no  geometrical  resampling.  That  is  to  say 
that  we  oper  a  “lampertization”  of  the  tools  where  the  first 
way  proposed  to  “delampertize”  the  signal  studied. 

By  direct  interpolation  we  have  few  details  for  the  short 
times  (in  fact  we  can’t  reconstitute  m  <  0)  and  we  ignore 
many  details  in  the  long  times  (taking  one  point  among 
many).  To  obtain  statistical  relevance,  one  has  to  have  a 
huge  number  of  points  in  the  original  data  to  make  some 
processing.  The  avantage,  remarked  in  [8,  10],  is  that  there 
are  fewer  points  in  X,  then  Y,  after  geometrical  resampling 
and  this  keeps  the  computational  cost  low. 

When  one  does  not  dispose  of  a  large  number  of  points, 
using  a  geometric  sampling  loose  much  information  on  the 
signal.  As  the  Fourier  transform  of  a  process  is  related  to 
the  Mellin  transform  of  the  process  transformed  by  £,  many 
methods  for  cyclostationary  processes  can  be  written  with 
Mellin  transformation  and  used  on  processes  with  DSI.  For 
self-similar  signals  (H  =  0),  estimators  constructed  in  this 
way  were  given  in  [19]  and  can  be  adapted  to  take  into  ac¬ 
count  an  exponent  H  and  DSI. 
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ABSTRACT 

In  this  paper  we  will  propose  a  modification  of  the 
well-known  VlTERBl- Algorithm  (VA)  for  communica¬ 
tion  channels  distorted  by  impulsive  instead  of  the  often 
used  GAUSSIAN  noise.  Here  we  assume  that  the  param¬ 
eters  -  e.g.  the  moments  -  of  the  noise  are  unknown. 
Instead  of  applying  a  recursive  solution  (see  [2])  by  re¬ 
peated  execution  of  the  VA  we  will  here  directly  em¬ 
bed  the  estimation  of  the  unknown  parameters  into  the 
structure  of  the  VA  itself.  Such  an  approach  is  called 
Per-Survivor  Processing  (PSP)  [8]  which  provides  a 
general  framework  for  the  approximation  of  Maximum 
Likelihood  Sequence  Estimation  (MLSE)  whenever  the 
presence  of  unknown  quantities  prevents  the  precise 
use  of  the  classical  VA.  In  addition,  the  classical  VA 
will  be  modified  so  that  it  works  optimally  for  some 
kind  of  impulsive  noise.  We  will  show  by  means  of 
the  modified  VA,  that  the  bit-error  rate  can  be  sub¬ 
stantially  decreased.  In  other  words,  only  with  mi¬ 
nor  technical  modifications  by  minimizing  an  adequate 
nonlinear  norm,  the  transmission  becomes  more  reli¬ 
able  compared  to  the  usual  euclidian  norm  minimized 
by  the  conventional  VA. 

Keywords:  Viterbi- Algorithm,  Impulsive  Noise,  Per- 
Survivor  Procesing,  MLSE,  Non-Euclidean  Norms 

1.  INTRODUCTION 

In  the  last  decade,  an  increased  interest  in  modeling 
of  impulsive  noise  can  be  observed  in  the  statistical 
signal  processing  community.  The  reasons  are  mainly 
twofold:  1.  new  insights  in  some  special  distributions, 
especially  the  stable  distribution,  are  found,  and,  2. 
the  non-GAUSSIAN  noise  found  in  many  applications, 
e.g.  communication  channels  [6],  [10],  biology  [4],  sonar 
[1],  Here  we  will  focus  on  the  problem  to  suppress 


additive  impulsive  noise  in  data  transmission,  where 
the  communication  channel  is  assumed  to  be  linear  and 
preliminary  time-invariant  with  impulse  response  h(k). 


Figure  1:  A  simple  discrete  communication  channel  model 

We  have  omitted  sampling,  coding  and  modulation 
for  reasons  of  simplicity  and  deal  only  with  sequences 
like  h(k),  k  €  Z  instead  of  time-continuous  signals  like 
h(t).  The  input  sequence  X(k)  (capital  letters  means 
random  variables  or  random  processes),  which  is  as¬ 
sumed  to  be  an  i.i.d.  random  process,  consists  only  of 
±1  for  each  k  (a  so-called  BPSK-sequence).  Before  the 
data  are  sent,  a  short  training  sequence,  being  known 
to  the  receiver,  is  transmitted.  Therefore,  during  the 
training  period  the  receiver  is  able  to  estimate  the  im¬ 
pulse  response.  Note  that  in  the  PSP-approach  the 
simultaneous  estimation  of  the  impulse  response  is  also 
possible  (see  [8]).  However,  here  we  will  focus  on  the 
simultaneous  estimation  only  of  the  noise  properties. 
Thus,  in  the  following  we  always  assume  a  known  im¬ 
pulse  response  estimate  h(k). 

The  most  popular  method  for  data  reconstruction 
given  h(k)  is  the  well-known  VlTERBl-algorithm.  If 
we  assume  a  limited  time  interval  k  =  0(1) A'  —  1  for 
transmission,  exactly  M  =  2h  different  data  sequences 
xm(k)  (realizations  of  random  process  are  written  in 
small  letters)  could  be  sent.  Given  the  received  se¬ 
quence  y(k),  the  VlTERBl-algorithm  reconstructs  the 
transmitted  data  sequence  x„,n(k),  mo  €  (1,...,  M)  by 
minimizing  the  euclidian  norm 

K- 1  2 

Y/{y(k)-h(k)*xm(k))  ,  (l) 

k= 0 
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regarding  m  in  a  recursive  and  therefore  very  efficient 
manner.  We  will  denote  this  kind  of  VA  as  the  con¬ 
ventional  VlTERBi-algorithm.  The  use  of  the  euclid¬ 
ian  norm  can  be  mathematically  substantiated  by  the 
maximum-likelihood-principle,  if  the  channel  noise  is 
Gaussian  (see  e.g.  [7],  pp.  249).  However,  despite 
of  the  intuitive  nature  of  the  euclidian  norm,  it  is  in  no 
sense  optimal  in  case  of  impulsive  noise.  Thus,  deriving 
a  modified  VA  also  based  on  the  maximum-likelihood- 
principle  could  be  quite  successful.  Moreover,  it  is  well- 
known  from  information  theory,  that  the  channel  ca¬ 
pacity  C\ Gauss  case  °f  Gaussian  noise  is  always 
lower  than  ClNon-GAUSS  f°r  non-GAUSSlAN  noise 

^IGauss  <  ^Non-GAUSS  ■  (2) 

Since  the  channel  capacity  can  be  seen  as  an  upper 
bound  for  the  maximum  data  rate,  this  inequality  means 
that  on  a  non-GAUSSlAN  channel  a  higher  bit  rate  is 
principally  possible.  Of  course,  eq.  (2)  requires  the 
same  noise  power  in  the  Gaussian  as  well  as  in  the 
non-GAUSSlAN  case.  Because  the  noise  power,  which 
is  simply  a  second  order  moment  for  zero-mean  noise, 
does  not  necessarily  exist  for  each  random  process  - 
especially  not  for  any  kind  of  impulsive  noise  -  we  in¬ 
troduce  here  the  p-norm  power  of  N ( k )  as 

fi^  =  (E{|JW})p,  (3) 

where  E{...}  denotes  expectation.  Before  we  proceed 
with  a  modified  VA,  let  us  shortly  present  some  popular 
models  for  impulsive  noise. 

2.  MODELS  FOR  IMPULSIVE  NOISE 


Since  the  introduction  of  the  so-called  stable  distribu¬ 
tion  by  Shao  and  Nikias  [5],  [9]  in  the  signal  pro¬ 
cessing  community,  this  distribution  has  drawn  a  lot  of 
attention.  A  stable  random  variable  X  is  defined  via 
the  characteristic  function 

<fx(x)  =  ej/^-TlzrU+j^w^aOsign^))  ^ 


where 


and 


/  s  (tariff 

(x,a)  =  <  2  1  1 

l  ^  loS  1^ 

for  a  ^ 
j  for  a  = 

j 

r  1 

for  x  >  0 

sign  (a;)  =  < 

0 

for  x  =  0 

-1 

for  x  <  0. 

p  €  K  is  called  the  location  parameter ,  8  £  [—1,1] 
the  symmetry  parameter ,  a  =  (0, 2]  the  characteristic 
exponent  and  7  G  M+  the  scale  parameter.  The  stable 


distribution  enjoys  many  useful  properties  such  as  the 
linear  stability  theorem  (see  [5],  p.  20,  p.  24)  and  the 
generalized  central  limit  theorem  (see  [5],  p.  25).  The 
only  remarkable  drawback  of  the  stable  distribution  is 
that  -  even  in  the  symmetric  case  with  /?  =  0  -  no 
closed  form  for  the  probability  density  function  px  (x) 
(pdf)  exists.  Since  it  can  be  shown  that  a  stable  pdf 
has  algebraic  tails  -  in  fact,  the  smaller  the  value  of  a, 
the  thicker  the  tails  -  only  the  fractional  lower  order 
moments 


E{|X|P}  <  00  for  p  <  a,  a  <  2  (5) 

exist  (For  a  =  2  any  moment  exist).  This  is  the  rea¬ 
son  for  introducing  the  p-norm  power.  Beside  the  sta¬ 
ble  distribution,  the  Generalized  Gaussian  pdf  ([3],  p. 
74),  the  Generalized  Cauchy  pdf  ([3],  p.  78),  the  Mid¬ 
dleton’s  pdfs  as  well  as  the  GAUSSian  mixture  pdf 
are  popular  models  for  impulsive  noise.  Other  pdf’s 
suitable  to  model  impulsive  noise,  e.g.  the  Laplace- 
and  Student- t-pdf,  are  special  cases  of  the  above  den¬ 
sity  functions.  In  this  paper  we  will  concentrate  on 
the  CAUCHY-distribution;  further  investigations  con¬ 
cerning  the  remaining  distributions  will  be  done  in  the 
near  future.  After  the  basics  we  can  now  proceed  with 
the  modification  of  the  VlTERBi-algorithm  for  additive 
CAUCHY-noise. 


3.  THE  CAUCHY- VITERBI-ALGORITHM 


A  random  variable  Ar  is  CAUCHY-distributed  if  the 
is  given  by 


Px{x)  = 


a 

7r(cr2  +  X 2)  ’ 


pdf 


which  is  a  special  case  of  the  generalized  CAUCHY-pdf. 
Note  that  the  stable  distribution  is  identical  to  the 
CAUCHY-distribution  for  7  =  a,  a  =  l,  /3  =  p  =  0 
([5],  p.14).  To  emphasize  this  special  case  we  will 
use  7  instead  of  <7  in  the  following.  Observe  that  not 
only  the  variance  but  also  the  mean  of  a  Cauchy- 
random  variable  do  not  exist.  To  modify  the  Viterbi- 
algorithm  consider  the  multivariate  density  function  of 
Y(k),  k  —  0(l)if  —  1 


fY[o),..,Y(K-i){y(0),  ...,y(K  -  1)) 


K- 1 


=  n  /jV(fc)  (y (fc)  _  Hk)  *  Xm(k)) 

k= 0 
K-l 

-  n 


k= 0  7T  ( 7 2  +  (: y(k )  -  h(k)  *  xm(k))2^j 


(6) 


Now,  maximizing  /y Y(*r-i)(i/(0),  ...,y{K-\))  ac¬ 
cording  to  the  maximum-likelihood-pv'mciple  means  to 
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minimize  the  following  non-linear  norm  with  respect  to 
m 


K—l 


min 

m 


5> 


(  ,  (: y{k)-h(k)*xm(k))‘ ! 

l7+ - ; - 


(7) 


way  as  shown  in  Fig.  2,  but  directly  in  the  VA.  For 
a  further  introduction  to  PSP  see  [8].  At  this  point 
we  are  able  to  compare  the  conventional  VA  with  the 
Cauchy-VA  as  well  as  the  PSP-Cauoiiy-VA  by  nu¬ 
merical  simulations. 


which  can  be  easily  derived.  Hence,  the  optimal  re¬ 
ceiver  for  additive  CAUCHY-noise  consists  in  a  VA  with 
a  non-euclidian  norm.  We  will  denote  this  method  as 
the  CAUCHY-ViTERBi-algorithm.  Observe  that  this 
norm  includes  the  parameter  7  being  not  known  a- 
priori.  However,  a  data-aided  estimation  procedure  of 
this  unknown  parameter  based  on  tentative  low-delay 
decisions  at  the  VA-output.  can  be  applied.  This  ap¬ 
proach  is  shown  in  Fig.  2.  After  the  so-called  low- 


Figure  2:  A  data-aided  estimation  procedure  based  on  ten¬ 
tative  low-delay  decisions  at  the  VA-output 


delay  fc0,  the  VA  has  reconstructed  the  transmitted 
data  X(k  -  k0).  These  data  will  be  filtered  by  the 
estimated  impulse  response  h(k)  and  then  substraeted 
from  the  delayed  received  sequence  Y(k  -  k0).  There¬ 
fore,  the  additive  noise  N(k  -  k0)  can  be  reconstructed 
and  the  missing  parameter  7  can  be  estimated  from  the 
statistics  of  N(k  -  ko )•  In  fact,  the  following  estimator 
for  7  can  be  easily  derived  (along  the  lines  in  [5],  p.  69) 

K- 1 

7(A')  =  ft  l”(*)l^  (8) 

k= 0 

given  a  realization  n{k),  k  =  0(1)A’  —  1  of  the  ran¬ 
dom  process  N(k  -  k0).  An  alternative  approach  to 
the  above  blockwise  data-aided  estimation  procedure 
consists  in  the  application  of  the  per- survivor-principle 
(PSP).  This  principle  stems  from  the  idea  that  data- 
aided  estimation  of  unknown  parameters  can  be  em¬ 
bedded  into  the  structure  of  the  VA  itself.  This  means 
that  the  estimation  of  7  is  done  in  each  trellis  branch 
based  on  the  above  formula  in  a  recursive  manner 

7(iF)  =  (7(iF-l))^  \h(K)\i.  (9) 

Alternatively,  an  exponential  window  can  be  used  for 
recursive  estimation  of  7(A')  to  allow  tracking  of  time- 
variant  channels.  With  these  formulas  we  are  able  to 
reconstruct  the  current  noise  value  not  in  an  external 


4.  NUMERICAL  SIMULATION 

In  this  simulation,  we  have  carried  out  500  MONTE- 
CARLO-run’s  with  I\  =  5,  K  —  10,  K  =  20  transmitted 
BPSK-samples  in  each  run.  The  impulse  response  has 
been  assumed  to  be  exactly  known  as  /i,(0)  =  l/\/5, 
/i(l)  =  2/\/5,  h(k)  =  OVA-  ^  [0, 1].  The  tentative  delay 
has  been  chosen  to  k0  =  K.  To  measure  some  kind 
of  signal-to-noise  ratio,  we  define  the  signal-to-noise  p- 
norrn  ratio  as 

SNR,,  =  101og10  • 

Since  p  must  be  smaller  than  o  (see  eq.  (5))  and  o  is 
equal  to  one  for  CAUCHY-noise,  we  have  chosen  p  to 
p  —  0.9999.  Figure  3  (4,5)  shows  the  results  for  K  =  5 
(A  =  10,  I<  =  20). 


Figure  3:  Bit  error  rate  for  K  =  5  as  a  function  of  the 
signal-to-noise  p-norm  ratio  for  the  conventional- VA 
(solid  line),  the  Cauchy-VA  with  true  7  (dashed  line), 
the  Cauciiy-VA  with  estimated  7  (plus-signs)  and  for 
the  PSP-Cauchy-VA  with  estimated  7  (dashed-dotted 
line). 

It  can  be  seen  that  not  only  the  Cauchy-VA  with 
true  7  but  also  the  Cauchy-VA  with  estimated  7  clearly 
outperforms  the  conventional  VA.  In  particular,  for  an 
SNR,,  around  lOdB  to  20dB  the  bit  error  rate  is  reduced 
more  than  50  percent.  Observe  that  the  curve  for  true1 
7  almost  totally  overlap  the  curve  for  estimated  7.  We 
can  conclude  that  the  estimation  error  of  7,  even  if  it  is 
quite  high,  has  not  a  large  influence  on  the  minimiza¬ 
tion  procedure.  In  other  words,  the  non-EuCLiDean 
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norm  is  very  robust  against  parameter  estimation  er¬ 
rors.  Observe  also,  that  the  PSP-  Cauchy- VA  exhibits 
most  often  the  lowest  bit  error  rate. 


Figure  4:  Bit  error  rate  for  K  =  10. 


SNRp - ► 

Figure  5:  Bit  error  rate  for  K  =  20. 

Let  us  now  consider  the  last  three  figures  as  a  func¬ 
tion  of  K  =  5, 10, 20.  It  can  be  seen  that  for  K  =  5, 
so  for  very  small  data  length,  the  PSP-approach  has 
a  much  higher  performance  than  the  two  other  algo¬ 
rithms.  If  we  now  increase  K,  the  bit  error  rate  for  the 
conventional  VA  and  the  PSP-VA  remain  almost  un¬ 
changed,  whereas  the  bit  error  rate  of  the  Cauchy- VA 
is  decreasing.  For  K  =  20  this  bit  error  rate  reaches 
the  lower  one  of  the  PSP-approach.  This  means  that 
by  including  the  estimation  of  7  into  the  internal  struc¬ 
ture  of  the  VA  -  so  the  PSP-approach  -  the  estimation 
error  does  not  play  such  a  crucial  role  as  for  the  block- 
wise  approach  (Cauchy-VA).  If  K  is  increased,  the 
estimation  error  of  7  is  reduced,  so  that  the  bit  error 
rate  of  the  Cauchy-VA  now  starts  to  reach  the  one  of 
the  PSP-approach.  Consequently,  the  PSP-approach 
seems  to  be  very  suitable  for  fastly  time  varying  chan¬ 
nels  where  only  a  small  amount  of  data  are  available  to 
estimate  the  noise  statistics. 


5.  CONCLUSIONS 

In  this  paper  we  have  considered  the  estimation  of  only 
one  parameter,  namely  7  to  adapt  the  conventional  VA 
to  the  noise  statistics  of  the  channel.  To  describe  im¬ 
pulsive  noise  more  precisely  we  will  need  at  least  two 
parameters  -  one  is  responsible  for  the  height  of  the 
impulses,  whereas  the  other  for  the  probability  of  oc¬ 
curence  of  an  impulse.  Since  future  work  will  deal  with 
the  estimation  of  at  least  two  parameters  (e.g.  two 
different  noise  variances  in  case  of  GAUSSian  mixture 
processes) ,  the  PSP-approach  is  becoming  more  attrac¬ 
tive  even  for  a  moderate  data  length.  In  general,  all 
these  results  confirm  the  advantageous  use  of  the  chan¬ 
nel  noise  statistics  to  enable  a  more  reliable  transmis¬ 
sion. 
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ABSTRACT 

Communication  networks  have  to  rely  on  efficient  resource 
allocation  schemes  to  share  the  network  resources  (band¬ 
width,  buffer  size,  etc.)  among  users  offering  different  types 
of  traffic  (e.g.,  voice,  video  and  data).  Existing  schemes 
based  on  self-similar  traffic  models  assume  that  the  network 
traffic  is  Gaussian  and  exhibits  long-term  memory  charac¬ 
teristics  only.  Certain  classes  of  network  traffic  (e.g.,  MPEG 
video  traces)  are  however,  non-Gaussian  and  long-range  de¬ 
pendent.  In  such  cases,  resource  allocation  based  on  simpli¬ 
fied  assumptions  will  be  either  excessive  or  fail  to  provide 
the  specified  guarantees  on  the  quality  of  service  (QoS).  In 
an  earlier  work,  we  had  presented  an  efficient  resource  al¬ 
location  scheme  for  traffic  sources  having  (i)  Gaussian  as 
well  as  non-Gaussian  (log-normal)  distributions  and  (ii)  ex¬ 
hibiting  short-term  and/or  long-term  memory  characteris¬ 
tics.  In  this  paper,  we  assess  the  real-time  performance  of 
our  as  well  as  several  existing  schemes  using  a  Texas  In¬ 
struments  TMS320C6701  DSP  processor.  The  results  show 
that  (i)  although  our  algorithm  has  a  higher  computational 
load,  real-time  implementation  is  still  feasible,  and  (ii)  the 
increased  computational  load  is  justified  since  the  proposed 
algorithm  is  more  reliable  in  providing  QoS  guarantees  than 
existing  simplified  schemes. 


1.  INTRODUCTION 

A  stochastic  process  {S{n),  n  >  0}  is  said  to  exhibit  self¬ 
similar  characteristics  if  it  satisfies  the  scaling  property 

{S(o  ■  n)}  =  {aH  S{n)},  with  o  >  0,  H  >  0;  i.e.,  the  statis¬ 
tical  characteristics  of  S(n)  at  time  a  ■  n  is  a  scaled  version 
of  its  characteristics  at  time  n.  Therefore,  except  for  a  scal¬ 
ing  factor,  S(a  ■  n )  and  S(n)  are  similar.  Here,  H  is  the 
Hurst  parameter.  The  presence  of  self-similar  characteris¬ 
tics  in  network  traffic  was  first  observed  in  aggregated  Eth¬ 
ernet  traffic  measurements  at  Bellcore  [7].  Since  the  original 
discovery,  several  independent  measurements  over  differ¬ 
ent  networks  (including  LAN,  WAN,  ATM  and  NSFNET) 
and  traffic  generated  by  commonly  used  applications  such 
as  TELNET,  FTP,  WWW  browsers,  and  VBR  video  have 
been  collected  and  analyzed.  These  studies  clearly  demon¬ 
strate  the  presence  of  self-similar  characteristics  in  aggre¬ 
gated  network  traffic  traces.  Several  different  approaches 
have  been  proposed  to  deal  with  the  self-similar  nature  of 
network  traffic.  The  general  consensus  seems  to  be  that 
self-similar  characteristics  must  be  taken  into  consideration 
if  the  traffic  is  serviced  at  medium  to  high  operating  load 
conditions.  Load  (p)  is  defined  as  the  ratio  between  the 
mean  traffic  rate  and  the  bandwidth  at  which  it  is  serviced. 

In  this  paper,  we  focus  on  the  computational  complexity 
of  resource  allocation  schemes  that  are  based  on  self-similar 
traffic  models.  A  network  link  such  as  a  router  or  switch  is 
expected  to  support  thousands  of  connections.  Each  con¬ 


nection  carries  user  traffic  requiring  certain  guarantees  on 
the  Quality  of  Service  (QoS)  such  as  loss,  delay,  delay  jitter, 
etc.  Based  on  the  nature  of  the  traffic  and  the  specified  QoS 
requirements,  the  network  link  has  to  decide  whether  to  ac¬ 
cept  or  reject  the  user  connection.  The  mechanism  that 
makes  the  accept /reject  decision  is  called  as  Connection 
Admission  Control  (CAC).  CAC  relies  on  resource  alloca¬ 
tion  algorithms  to  make  its  decision.  It  is  important  that 
the  resource  allocation  schemes  require  minimal  process¬ 
ing  and  memory  overheads  so  that  decisions  can  be  made 
in  real-time.  Ideally,  one  would  like  to  have  schemes  based 
on  simple  analytical  expressions  derived  using  parsimonious 
traffic  models.  However,  to  the  best  of  our  knowledge  such 
schemes  do  not  currently  exist. 

Recent  advances  in  the  semi-conductor  industry  have  re¬ 
sulted  in  a  tremendous  increase  in  the  computational  ca¬ 
pabilities  of  microprocessors.  The  access  times  as  well  as 
cost  of  memory  devices  have  reduced  dramatically.  This 
trend  is  expected  to  continue  for  at  least  the  next  several 
years.  Therefore,  the  focus  of  our  work  has  been  to  develop 
real-time,  numerically  tractable  CAC  algorithms.  These  al¬ 
gorithms  should  exploit  the  available  processing  power  to 
accurately  allocate  the  resources  to  support  the  network 
traffic  exhibiting  complex  behavior. 

2.  SELF-SIMILAR  TRAFFIC  MODELS 

It  is  important  that  traffic  models  accurately  capture  both 
the  marginal  distribution  as  well  as  the  autocovariancc  func¬ 
tion  exhibited  by  the  network  traffic.  Although  self-similar 
processes  arc  inherently  non-stationary,  their  increments 
can  be  stationary.  Let  B(n)  be  a  self-similar  process  with 
Gaussian  marginal  distribution  and  assume  H  g  (0, 1).  If 
B(n)  has  stationary  increments,  x(n )  =  B(n  +  1)  —  B(n), 
then  B(n)  is  called  fractional  Brownian  Motion  (fBM),  and 
the  corresponding  x(n)  is  called  fractional  Gaussian  Noise 
(fGN)  [11].  The  autocovariance  function  of  a  fGN  process 
is  given  by: 

C2x(r) = MMiii!  {,r  _  ir  -  2irr + ir + in . 

As  r  — >  oo,  we  have  C2*(r)  ~  K ■  \t\2H~2 ^  where  K  is  a  con¬ 
stant  and  K  /  0.  Traditional  time-series  analysis  assume 
the  “mixing”  condition;  i.e.,  X](r|  lC2*(r)l  <-  °°>  an<^  hencc 
C2x{t)  decays  at  a  rate  faster  than  l/|r[.  Note  that  for  the 
fGN  process,  C2 x(r)  decays  at  a  rate  slower  than  l/|r|  im¬ 
plying  that  53!t|  C2x(t)  =  oo.  Stationary  processes  whose 

autocovariance  functions  exhibit  such  behavior  are  called  as 
long-memory  or  long-range  dependent  processes.  For  the 
rest  of  this  paper,  we  will  use  the  terms  long-memory  and 
self-similarity  interchangeably. 

FGN  based  traffic  models  are  simple  and  amenable  to 
mathematical  analysis.  However,  the  network  traffic  ex¬ 
hibit  complex  behavior  requiring  more  sophisticated  traf- 
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Figure  1.  Single  server  queue. 

fic  models.  A  fractional  Autoregressive  Integrated 
Moving  Average  (fARIMA)  process  is  an  extension  of 
the  ARMA  process  and  exhibits  long-memory  characteris¬ 
tics.  A  fARIMA  process  differs  from  a  fGN  process  in  the 
following  manner.  The  cumulative  sum  of  a  fGN  process 
B(n)  —  x(k)  is  exactly  self-similar;  the  cumulative 

sum  of  a  fARIMA  process  is  asymptotically  self-similar. 

In  the  2-domain,  the  relation  between  the  fARIMA(p,<i,g) 
process  x(n)  and  the  driving  noise  vu(n)  is  as  follows 

A(z)X(z)  =  (l-z-1)-4  &(z)W(z),  —  0.5  <  d  <  0.5,  (1) 

where  A(z)  =  1  +  a(l)2_1  +  •  •  •  +  a{p)z~p ,  B(z)  =  1  + 

fe(l)2_1  -I - +  b(q)z~9  and  w(n)  is  i.i.d.  Gaussian.  The 

presence  of  a  fractional  pole  at  2  =  1  introduces  long- 
memory.  Note  that  the  AR  and  MA  parameters  {ai}?=1 
and  {bj  }?=1  give  rise  to  short  memory  characteristics  in  the 
process.  Hence,  a  fARIMA  model  offers  flexibility  in  accu¬ 
rately  capturing  both  the  short-  and  long-memory  charac¬ 
teristics  exhibited  by  the  network  traffic.  From  the  model 
parameters  d  (d  =  H  -  0.5),  a  =  [1,  o(ll,---,  a(p)]  and 
b  —  [1,  6(1),  •  •  • ,  6(g)],  the  theoretical  a x(r)  can  be  calcu¬ 
lated  as  follows: 

C2x{t)  =  p(r  i  d  I  i)r(l  -l -d)*^2  h^)h{t  +  t),  (2) 

where  *  denotes  convolution,  H(z)  —  B(z)/A(z)  and 
T(k)  —  Jo”  tk~l  exp (t)dt  is  the  familiar  Gamma  function. 

Although  multiplexed  traffic  tends  to  have  a  Gaussian 
distribution,  individual  traffic  sources  are  seldom  Gaussian. 
For  example,  MPEG  traffic  sources  exhibit  a  heavier  tail 
than  Gaussian  [10].  In  [6],  we  proposed  a  log-uormal 
fARIMA  traffic  model  for  MPEG  traffic  sources.  Here,  we 
model  the  log  transformed  data  y(n)  =  In  x(n)  as  fARIMA. 
We  then  infer  the  autocovariance  structure  of  the  traffic 
source  x(n)  through  that  of  y(n).  The  relationship  between 
the  autocovariance  function  C2x(r)  of  x(n)  and  that  of  y(n) 
is 

C2x(t)  =  exp{2 Py  +  <Ty}  ■  (exp{c2v(r)}  -  1) .  (3) 

3.  RESOURCE  ALLOCATION  FRAMEWORK 

Our  resource  allocation  framework  provides  statistical  QoS 
guarantees  based  on  the  Effective  Bandwidth  Theory 
(EBT).  EBT  focuses  on  a  network  link  such  as  a  router  or 
switch  and  gives  a  measure  of  bandwidth  and  buffer  size  re¬ 
quired  to  achieve  a  trade  off  between  different  performance 
criteria  such  as  the  overflow  probability,  delay,  jitter,  etc. 

Typically,  the  operation  of  a  network  link  is  modeled  as 
a  single  server  queue  (Figure  1)  and  the  allotted  resources 
(bandwidth  or  capacity  C  and  buffer  size  B)  are  expected 
to  provide  guarantees  on  the  buffer  overflow  probability. 
Buffer  overflow  probability  is  amenable  to  mathematical 
analysis  and  is  a  commonly  used  performance  index. 

If  an  input  traffic  x(n)  is  offered  to  a  network  link,  then 
a  burst  of  size  k  will  cause  an  overflow  if  {x(l)  +  •  •  •  + 
x(k)j  >  k  ■  C  +  B.  Denote  the  sample  mean  of  {r(n)}*=1 
by  Xk  —  £  {x(l)  +  ■  ■  •  +  x(k)}.  Then,  if  the  user  demands 
a  loss  probability  no  larger  than  e,  the  capacity  C  and  buffer 
size  B  should  be  such  that 

Pr  jxfe  >  C  +  j  <e,  V  k.  (4) 


Rigorous  calculation  of  the  network  resources  B  and/or  C 
based  on  (4)  requires  knowledge  of  the  probability  density 
function  (PDF)  of  Xk,  the  sample  mean  of  a  burst  traffic 
of  size  k. 

If  x{n)  is  a  Gaussian  i.i.d  random  process  with  mean 
p  and  variance  cr2,  then  the  required  capacity  or  effective 
bandwidth  C  is  given  by  [1] 

2 

C  =  p  +  —  8,  (5) 

where  6  =  |"  ~  .  Therefore,  for  a  given  loss  probability 

e  and  buffer  size  B,  one  can  compute  8  and  then  substitute 
into  (5)  to  find  the  effective  bandwidth  C.  Alternatively,  if 
the  network  can  only  afford  a  service  rate  C,  then  6  can  be 
obtained  from  (5)  and  the  required  buffer  size  B  =  [  ~  ] . 

Reference  [2]  considered  Gaussian  sources  with  autoco¬ 
variance  function  C2x(r)  satisfying  52,.  C2x(r)  <  00;  e.g. 
ARMA  processes,  Markov  (and  its  variants)  processes,  etc. 
Their  approximate  formula  for  effective  bandwidth  is 

C  =  p  +  ?  y^c2x(r).  (6) 

T 

The  presence  of  long-term  memory  in  network  traffic  im¬ 
plies  that  52r  c“2x(t)  is  unbounded.  Therefore,  (6)  tends  to 
be  too  conservative  in  allocating  resources  ( B  and  C)  when 
network  traffic  exhibits  long-term  memory.  In  [8],  the  Bell¬ 
core  traffic  traces  were  modeled  as  a  fGN  process.  For  a 
given  buffer  size  B,  the  effective  bandwidth  is  obtained  as 

C  =  /i+((I^P^)”  H  c2x(Q)^ (7) 

where  H  =  d  +  0.5.  When  d  =  0  {x(n)  =  i.i.d)  ,  it  can  be 
verified  that  (7)  reduces  to  (5). 

Several  recent  studies  have  confirmed  that  network  traffic 
exhibits  both  short-term  and  long-term  memory  character¬ 
istics  [3,  4].  Calculation  of  effective  bandwidth  based  on 
a  model  that  captures  only  the  long-term  memory  prop¬ 
erty  is  inadequate  when  x(n )  exhibits  both  short-term  and 
long-term  memory  characteristics.  In  [5],  we  presented  a  re¬ 
source  allocation  scheme  based  on  Gaussian  fARIMA  traffic 
model.  For  a  specified  buffer  overflow  probability  e,  the  op¬ 
timal  ( B,C )  pair  is  obtained  by  minimizing  the  cost  function 

1  // 'T  .  1  B\2 

J(B,c){k)  =  - - - - +ln(e)— ln(0.5)  >0,  Vfc.  (8) 

^  'yk 

Here,  7*  =  £|r|<fc  (1  —  ^-)c2 x{t).  Figure  2  shows  a  plot  of 
J(B,c){k )  for  different  (B,  C)  pairs.  It  is  seen  that  for  each 
(B,C)  pair,  J(b,c)  (k)  has  a  unique  minimum  at  k  =  k0-  If 
J(B,c)(ko)  >  0,  then  (8)  is  satisfied  V  &.  A  (B,  C)  pair  is 
optimum  if  min*.  J(B,c){k)  =  J(B,c){k0)  -  0. 

For  x(n)  non-Gaussian,  the  true  PDF  of  the  fc-sample 
mean  Xk  does  not  usually  have  a  closed  form  expression.  In 
[6]  we  presented  a  resource  allocation  scheme  based  on  log¬ 
normal  fARIMA  traffic  model.  We  approximated  Xk  by  a 
log-normal  random  variable  Xk  for  which  In  Xk  is  Gaussian 
with  mean  fik  and  variance  df.  Now  going  back  to  the 
resource  allocation  framework  (4),  we  replace  Xk  by  Xk 
and  write 

Pr{\nXk  >  ln{C  +  B/k)}  <  e,  V  k. 
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Figure  2.  Plot  of  J(b,C)(&)  for  different  (B,  C)  pairs. 

Since  In  A*  is  a  Gaussian  random  variable  with  mean  /i*. 
and  variance  d*.  we  apply  the  Gaussian  resource  allocation 
framework  to  obtain  the  cost  function 

J(Blc)(fc)  =  (ln(C  +  f/afc)~/X*)2  +  ln(2f)  £  V  ^ 

to  compute  the  optimal  (B,  C)  pair. 

In  the  next  section,  we  will  analyze  the  computa¬ 
tional  complexity  of  our  fARIMA  based  resource  allocation 
schemes. 

4.  REAL-TIME  IMPLEMENTATION 

Resource  allocation  algorithms  typically  run  as  a  software 
module  in  a  network  device  such  as  a  router  or  switch.  In  or¬ 
der  to  analyze  the  real-time  performance  of  fARIMA  model 
based  resource  allocation  algorithms,  we  selected  the  Texas 
Instruments  (TI)  TMS320C6701  floating  point  DSP  pro¬ 
cessor.  The  processor’s  dedicated  multiplier-accumulator 
circuitry,  multiple  access  and  special  memory  addressing 
modes  designed  to  speed  up  repetitive  operations  make  it 
an  ideal  platform  for  CAC  algorithm  implementation.  We 
operated  the  processor  at  a  clock  frequency  of  133  MHz. 
The  processor’s  on-chip  program  and  data  memory  (64  KB 
each)  were  found  to  be  sufficient  for  implementing  our  al¬ 
gorithm.  The  entire  algorithm  was  implemented  in  the  C- 
programming  language. 

4.1.  Implementation  Details 

The  algorithm  first  obtains  the  QoS  requirement  and  the 
traffic  model  parameters  from  the  user.  Then,  it  calculates 
the  autocovariance  function  C2X  (r)  from  the  model  parame¬ 
ters.  In  our  current  implementation,  we  fixed  the  maximum 
lag  value  (r,n ax)  at  1024.  The  correlation  values  are  stored 
in  a  double-precision  floating-point  array.  An  implication 
of  using  an  upper  limit  on  r  is  that  the  algorithm  can  only 
compute  those  optimal  ( B,  C )  pairs  for  which  the  burst  size 
kn  =  argmin*.  J(B,C){k)  is  less  than  rmax. 

Based  on  the  QoS  parameter  specified  by  the  user  and 
existing  traffic  conditions  in  that  class  of  service,  the  net¬ 
work  link  allots  a  fixed  buffer  size  B  to  the  user  traffic. 
Regulating  the  buffer  size  provides  flexibility  in  handling 
delay  requirements.  Once  B  is  fixed,  the  optimal  band¬ 
width  CoVt  needed  to  support  the  connection  is  computed. 
Recall  that  when  C  —  Copt,  we  have  min*  J{B,c){k)  =  0. 
From  equations  (8)-(9),  we  observe  that  J(B,c)(k)  is  mono- 
tonically  increasing  with  C.  Therefore  if  C  >  Copi ,  we  have 


that  mint  ( k )  >  0  whereas  if  C  <  C„pt,  we  have  that 

min k  J(D.c){k)  <  0.  This  prompts  us  to  employ  an  itera¬ 
tive  search  algorithm  to  find  Copt .  We  first  pick  C i  and  C-2 
such  that  min*  J(B,Ci)(fc)  >  0  and  min*.  ./(b,c'2)W  <  0. 
We  can  be  sure  that  Copt  lies  between  C\  and  Ci,  i.e. , 
Ci  >  C„vt  >  C-2,  and  we  call  C\  and  C2  the  bracket  points. 
Next,  we  would  like  to  narrow  down  this  bracket.  At  the  ith 
iteration,  pick  Ci  >  Ci  >  C2.  If  min*.  ./(b,C;)(/c)  >  0,  then 
we  infer  that  C,:  >  C„pt  >  C2  and  we  replace  Ci  with  C*. 
On  the  other  hand,  if  min*.  J(B,Ci)(k)  <  0,  then  we  must 
have  Ci  >  C0pt  >  Ci  and  we  replace  C2  with  C,:.  By  suc¬ 
cessively  narrowing  down  the  range,  we  bring  the  bracket 
points  together  and  soon  they  converge  to  Copt-  Brent’s 
method  in  particular  provides  supcrlinear  convergence  to 
the  optimal  solution  [9], 

In  the  process  of  searching  for  an  optimal  bandwidth,  the 
algorithm  needs  to  repeatedly  search  for  the  burst  size  k„ 
that  results  in  the  minimum  cost  function.  For  a  given 
bandwidth  C,  the  algorithm  first  obtains  a  three  point 
bracket  ( ka ,  kb-  kc)  that  captures  the  minimum  [9,  page 
400].  Then,  applying  the  “Golden  Search”  method,  it  iden¬ 
tifies  the  burst  size  k„  within  the  interval  ( ka ,  kc )• 

If  the  network  link  can  afford  to  allocate  C„pt  to  the  user, 
it  goes  ahead  and  accepts  the  user  connection.  If  it  docs 
not  have  the  required  bandwidth,  the  algorithm  provides 
the  user  with  an  option  to  renegotiate  its  QoS  requirement. 

4.2.  Performance  Evaluation 

We  experimented  with  a  MPEG  video  trace  “Dino” ,  ob¬ 
tained  from  [10].  “Dino”  has  a  heavier  tail  than  Gaussian. 
Its  mean  and  standard  deviation  is  equal  to  20  KB  and 
8  KB  respectively,  long-memory  parameter  d  =  0.35  and 
short-memory  parameters  a  =  [1,  —  0.56]  and  b  =  [1]  re¬ 
spectively.  Suppose  this  source  is  to  be  admitted  to  the 
network  with  a  desired  loss  probability  e  —  10  ' .  Based 
on  this  QoS  parameter,  the  CAC  algorithm  allocates  buffer 
size  B  and  bandwidth  C  according  to  the  assumed  traffic 
model  (c.f.  Section  3). 

If  the  traffic  is  assumed  to  be  Gaussian  and  long-range 
dependent  only,  then  the  required  bandwidth  for  a  fixed 
buffer  size  can  be  calculated  directly  using  (7).  FGN  model 
based  resource  allocation  scheme  has  the  lowest  computa¬ 
tional  demand.  If  the  traffic  is  assumed  to  be  Gaussian 
having  both  short-  and  long-memory  characteristics,  then 
the  optimal  resources  can  be  obtained  by  minimizing  (8).  If 
we  model  the  traffic  as  a  log-normal  fARIMA  process  to  ac¬ 
curately  capture  its  marginal  distribution,  then  the  optimal 
resources  can  be  obtained  by  minimizing  (9). 

Table  1  shows  the  CPU  clock  counts  required  for  fARIMA 
model  based  resource  allocation  algorithms  to  converge.  For 
B  ranging  from  50  KB  to  400  KB,  the  algorithms  converge 
within  100  milli-seconds.  We  observe  that  their  rate  of  con¬ 
vergence  depends  on  the  following  factors: 

•  Buffer  size  ( B )  -  the  execution  time  is  directly  propor¬ 
tional  to  the  allotted  buffer  size.  For  large  buffer  sizes, 
the  search  space  to  compute  the  optimal  C  is  higher 
resulting  in  an  increase  in  execution  time. 

•  Bracketing  strategy  -  the  execution  time  is  depends  on 
the  number  of  times  the  algorithm  evaluates  different 
C’s  to  obtain  the  bracket  points. 

•  Strategy  for  obtaining  Copt  within  the  bracket  points. 

•  Strategy  for  obtaining  the  minimum  of  J(B,c)(k). 

The  execution  time  can  be  further  reduced  by  (i)  using  a 
DSP  processor  with  a  higher  MIPS  (million  instructions  per 
second)  rating,  (ii)  implementing  key  modules  in  assem¬ 
bly  programming  language,  (iii)  identifying  modules  that 
can  be  implemented  in  parallel  and  (iv)  developing  better 
bracketing  and  search  strategies. 
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B 

(KB) 

Copt 

(KB) 

Cycle  Count 

Execution 
Time  (m-sec) 

50 

— 47.12 

9,  831,  479 

74 

100 

44.25 

10,  265, 182 

77 

200 

41.52 

11,732,340 

88 

300 

40.05 

12, 040, 384 

90 

400 

39.07 

12,  246,  093 

92 

(a) 


B 

-n- . 

VJopt 

Cycle  Count 

Execution 

(KB) 

(KB) 

Time(m-sec) 

50 

84.57 

8,891,587 

67 

100 

75.77 

9,401,718 

71 

200 

66.09 

11,201,191 

84 

300 

60.68 

11,761,252 

88 

400 

57.16 

12,069,090 

91 

(b) 


Table  1.  Real-time  implementation  of  fARIMA 
model  based  resource  allocation  schemes:  (a)  Gaus¬ 
sian  (b)  log-normal. 

Is  it  important  to  have  resource  allocation  schemes  based 
on  traffic  models  that  accurately  capture  the  marginal  dis¬ 
tribution  and  the  autocovariance  structure  exhibited  by 
the  network  traffic?  Why  can’t  we  use  fGN  model  based 
schemes  for  all  traffic  sources?  Table  2  shows  single  server 
queue  simulation  results  to  illustrate  the  importance  of  hav¬ 
ing  such  schemes.  Based  on  the  log-normal  fARIMA  model 
parameters  obtained  from  “Dino” ,  we  generated  a  synthetic 
traffic  trace  of  length  10 7  samples  to  offer  as  input  to  the 
single  server  queue.  Table  2(a)  gives  the  buffer  overflow 
probability  when  the  network  link  allocates  resources  based 
on  the  fGN  traffic  model.  Table  2(b)  gives  the  overflow 
probability  when  the  resources  are  allocated  based  on  Gaus¬ 
sian  fARIMA  traffic  model.  Prom  the  results,  we  observe 
that  resource  allocation  schemes  based  on  inaccurate  traffic 
models  fail  to  maintain  the  specified  guarantees  on  the  over¬ 
flow  probability.  When  the  resources  are  allotted  based  on 
log-normal  fARIMA  traffic  model,  which  accurately  cap¬ 
tures  marginal  distribution  as  well  as  the  autocovariance 
structure  of  the  traffic,  the  allotted  resources  provide  the 
specified  guarantees;  see  results  in  Table  2(c). 

5.  CONCLUSIONS 

FGN  traffic  model  based  resource  allocation  scheme  has  the 
lowest  computational  requirements  for  calculating  the  opti¬ 
mal  (jB,  C)  pair.  However,  it  is  not  sufficient  for  accurately 
capturing  the  statistical  characteristics  of  different  traffic 
types.  FARIMA  based  traffic  models  provide  more  flexibil¬ 
ity  in  parsimoniously  capturing  the  short-  and  long-memory 
characteristics  of  network  traffic.  Although,  our  fARIMA 
model  based  resource  allocation  schemes  have  a  higher  com¬ 
putational  load,  real-time  implementation  is  still  feasible. 
When  implemented  on  a  TI  TMS320C67  DSP  processor, 
the  algorithms  typically  converge  within  100  milli-seconds. 
Several  optimization  techniques  have  been  identified  which 
can  further  reduce  the  execution  time  significantly. 
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Abstract —  This  paper  addresses  the  Blind  Source  Sep¬ 
aration  (BSS)  problem  in  the  context  of  ’’heavy-tailed”, 
or  ’’impulsive”  source  signals,  characterized  by  the  nonex¬ 
istence  of  finite  second  (or  higher)  order  moments.  We 
consider  Pham’s  Quasi-Maximum  Likelihood  (QML)  ap¬ 
proach,  a  modification  of  the  Maximum  Likelihood  (ML) 
approach,  applied  using  some  presumed  distributions  of 
the  sources.  We  introduce  a  related  family  of  suboptimal 
estimators,  termed  Restricted  QML  (RQML).  A  theoret¬ 
ical  analysis  of  the  asymptotic  performance  of  RQML  is 
presented.  The  analysis  is  used  for  showing  that  the  vari¬ 
ance  of  the  optimal  (non-RQML)  estimator’s  error  must 
decrease  at  a  rate  faster  than  1/T  (where  T  is  the  num¬ 
ber  of  independent  observations).  This  surprising  prop¬ 
erty,  sometimes  called  super-efficiency,  has  been  observed 
before  (in  the  BSS  context)  only  for  finite-support  source 
distributions.  Simulation  results  illustrate  the  good  agree¬ 
ment  with  theory. 

I.  Introduction 

Heavy-tailed  distributions  assign  relatively  high  proba¬ 
bilities  to  the  occurrence  of  large  deviations  from  the  me¬ 
dian,  and  are  often  used  for  modelling  impulsive  signals. 
A  common  characteristic  property  of  many  heavy-tailed 
distributions  (such  as  the  a-stable  family),  is  the  nonex¬ 
istence  of  finite  second  (or  higher)  order  moments,  which 
will  take  an  important  roll  in  this  work. 

The  simplest  Blind  Signal  Separation  (BSS)  model  as¬ 
sumes  that  N  instantaneous  linear  mixtures  of  N  inde¬ 
pendent  stationary  source  signals  are  observed.  They  are 
formulated  as  x(t)  =  A-s(t)  where  x(t)  and  s(t)  are  N  x  1 
column  vectors  of  the  observed  signals  and  source  signals 
(respectively),  and  the  square  N  x  N  ’’mixing  matrix”  A 
contains  the  mixing  coefficients.  The  BSS  problem  con¬ 
sists  of  recovering  the  sources  s (t)  using  only  the  observed 
data  x(f),  t  =  1,  2, . . .  T  and  the  assumption  of  indepen¬ 
dence  between  the  entries  of  s(t).  It  can  be  formulated 
as  the  computation  of  an  N  x  N  ’’separating  matrix”  B 
whose  output  y (t)  =  B  •  x(f)  is  an  estimate  of  s (t). 

There  are  several  well-known  methods  for  BSS  (see, 
e.g.,  [1]  for  a  comprehensive  survey),  based  on  Information 
Theory,  High-Order  Statistics  (HOS),  etc.  Some  of  these 
methods  (e.g.,  using  the  whiteness  constraint  or  HOS)  as¬ 
sume  that  the  sources  have  finite  second  (or  higher)  order 
moments.  Other  methods,  such  as  Pham’s  Quasi  Maxi¬ 
mum  Likelihood  (QML)  [2],  avoid  this  assumption  in  the 
estimation  algorithm;  however  it  is  still  used  for  the  error 
analysis. 

Thus,  when  the  source  signals  are  ’’impulsive”  with  no 
finite  second  (or  higher)  order  moments,  the  use  and/or 


the  error  analysis  of  existing  BSS  algorithms  become 
problematic  in  many  respects.  This  paper  is  concerned 
with  the  asymptotic  (T  — *■  oo)  error  analysis  in  such  cases. 
To  facilitate  the  analysis,  the  sources  are  assumed  to  be 
strictly  white,  i.e.,  each  source  signal  is  a  sequence  of  inde¬ 
pendent,  identically  distributed  (i.i.d.)  random  variables. 
Symmetric  distributions  are  also  assumed. 

II.  QML  and  RQML 

A  popular  variant  of  the  Maximum  Likelihood  (ML) 
estimator  in  ’’blind”  contexts  is  the  QML  estimator  [2], 
[3].  QML  attempts  to  apply  the  ML  approach  using  some 
given  hypothetical  model  for  the  sources’  probability  dis¬ 
tribution  function  (p.d.f.)  as  a  substitute  for  the  true  un¬ 
known  model.  This  leads  to  solving  (with  respect  to  the 
elements  of  B)  the  following  system  of  estimating  equa¬ 
tions,  as  outlined  in  [2] 

E[*i(yi)Vj\=  0  1  <i^j<N  (1) 

where: 

£[•]  is  the-time  averaging  operator,  E[z\  —  j,  Y%=  i 
y,  is  the  i-th  estimated  source  signal  at  the  output  of  the 
separating  matrix  B; 

T;(.r)  are  some  nonlinear  ’’separation  functions”,  chosen 
a  priori,  which  would  optimally  be  the  (unknown)  score 
functions  of  the  i-th  source. 

The  ’’small  errors”  analysis  derived  for  this  estimator 
in  [2]  is  not  valid  for  heavy-tailed  sources  which  do  not 
posses  finite  second  moments.  Note,  for  example,  that 
even  when  B  equals  A-1  and  we  have  y  =  s,  the  left- 
hand  side  of  (1),  E  [T,  (s,)  .sy],  does  not  converge  to  0  in 
L2  sense  (since  E  [.$“']  is  infinite),  which  undermines  the 
validity  of  a  second-order-based  analysis  in  this  case.  In 
order  to  mitigate  this  difficulty,  we  propose  the  follow¬ 
ing  restricted  estimator,  which  we  term  the  ’’Restricted 
QML”  (RQML): 

E[Vi(yl)yjI(\yj\<C)\  =  0  (2) 

Where  the  Indicator  Function  I(\x\  <  C )  equals  1  iff 
\x\  <  C  (and  equals  0  otherwise),  and  C  is  some  arbi¬ 
trary  positive  constant. 

III.  Small  errors  analysis  of  RQML 

We  now  introduce  a  ’’small  errors”  analysis  for  the 
RQML  estimate.  Because  of  the  permutation  ambigui¬ 
ties,  a  ’’good”  separation  procedure  needs  not  produce  a 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


78 


B  close  to  inverse  of  the  true  A,  but  only  that  B  ■  A 
be  close  to  a  permuted  diagonal  matrix.  However,  if  we 
permute  and  scale  both  B  and  A  by  the  same  convention, 
we  may  expect  that  B  •  A  =  I  —  e,  where  I  is  the  identity 
matrix,  and  e  is  a  ’’small”  matrix.  Now  since  y  =  B  •  A  •  s 
we  have  yi  =  Si  —  Y2'J=i  £ijsji  where  Sj  denotes  the  j-th 
source  and  £, j  denotes  the  general  element  of  the  error 
matrix  £ 

To  simplify  the  exposition,  from  now  on  we  assume  a 
two  sensors  -  two  sources  model  (TV  =  2),  using  identi¬ 
cal  separation  functions  ']>i  =  ^>.  Generalization  to  more 
sensors  and  sources  with  individual  T,-s  is  straightfor- 

iV"=2 

ward,  but  will  not  be  pursued  in  here.  Thus,  yi  — 
(1  —  £u)si  —  SijSj.  Under  the  ’’small  errors”  assumption, 
£u  -C  1  (asymptotically),  so  that 

Vi  ~  Si  —  £ijSj  i,j  =  1,2;  i  ^  j.  (3) 


usually  decrease  to  zero  at  ±oo,  this  tailoring  would  not 
change  T  (x)  by  much  if  C\  is  large  enough. 

Substituting  (5a)  into  (4a)  we  get 

E  [tf(ai  -  Sl2S2)(s2  -  £2lSlU(|sl  -  £12S2|  <  Cl)- 

•/(|*2-£2isi|  <C)]  =  0  (6) 


It  is  relatively  straightforward  to  show,  that  under  the 
’’small  errors”  assumption  and  with  C\  <C  C,  the  product 
Jflsi  —  £i2S2|  <  Ci)/(|s2  —  £2i-Si |  <  C)  is  nonzero  only  if 
| .32 j  <  C  (to  some  approximation).  Thus,  we  may  now  use 
the  Taylor  expansion  of  'I'(si  —  £12^2)  about  8\.  This  is 
legitimate  wherever  the  product  of  indicators  is  nonzero, 
since  £i2s2  •C  s\  (£12  — >  0)  and  £i2S2  is  bounded  (by 
£12 C),  and  we  have: 


E  [(£12 s2f]  <  E  [(£12C)2]  =  C2E  [s212]  r-=S°  0.  (7) 


Substituting  (3)  into  (2)  we  have  (for  the  TV  =  2  case) 

E  [«(«!  -  £12S2)(S2  -  £2lSl)/(|S2  ~  £21*1 1  <  C)]  =  0  (4a) 

E  [¥(sa  -  £21Si)(si  -  £l2S2)^(|Sl  -  sus2\  <  C)]  =  0  (4b) 

Eqns.  (4a)&(4b)  implicitly  relate  the  off-diagonal  ele¬ 
ments  of  £  to  s(l  :  T)  (i.e.,  tosi(l),s2(l),  ...,Si(T),s2(T)). 
When  etj  can  be  formulated  as  an  explicit  function  of 
s(l  :  T),  it  is  relatively  straightforward  to  deduce  the 
statistics  of  the  errors  from  those  of  the  sources.  How¬ 
ever,  for  arbitrary  \P(x),  there  is  no  explicit  closed- form 
solution  expressing  s^-  in  terms  of  s(l  :  T).  Consequently, 
the  error  analysis  becomes  more  involved.  We  have  to  em¬ 
ploy  some  general  statistical  assumptions  on  the  errors  £y 
in  order  to  find  its  more  particular  statistical  properties. 
The  general  assumptions  to  be  used  are: 

•  Al:  L2  consistency:  £y  — ►  0  (in  L2  sense)  as  T  — >  oo; 
and 

•  A2:  Stronger  convergence  of  higher  powers  of  the  error: 
i.e.,  £^-  converges  faster  to  zero  as  the  order  n  increases; 
consequently,  £■)■  e,j  (asymptotically)  for  n  =  2, 3, . . .. 

We  restrict  the  discussion  to  separation  functions  'b(x) 
which  are  differentiable,  odd  and  bounded.  In  addition, 
we  now  further  assume  that  'L(x)  satisfies  the  following 
condition: 

•  B:  There  exists  some  finite  C\  ( C\  -c  C),  such  that 
’L(x)  (and  its  derivative  T'fx))  vanish  outside  the  region 
|x|  <  C i-  This  condition  can  be  formulated  as: 

'L(x)  =  ¥(a)  •  I(\x\  <  Ci)  (5a) 

4',(x)  =  ^,(x)-7(|x|  <Ci),  (5b) 

where  Cl  <C  C. 

Generally,  this  is  a  rather  restrictive  condition,  used  in 
here  only  to  simplify  the  derivation.  In  [4]  we  show  how 
this  condition  can  be  considerably  relaxed.  Note,  how¬ 
ever,  that  any  differentiable  estimating  function  T(x)  can 
be  tailored  at  the  boundaries  so  as  to  smoothly  roll-off 
to  zero  inside  the  region  |x|  <  C\.  Since  all  useful  T(x) 


So  under  assumption  Al  we  have  £12^2  0,  and  it  is 

sufficient  to  expand  T(si  —  £12^2)  about  si  up  to  first 
order,  since  higher  orders  of  £12^2  are  negligible  under 
assumption  A2.  It  is  important  to  observe,  that  although 
the  QML  estimator  is  equivalent  to  RQML  when  C  — ^ 

00,  we  then  have  E  |^(£i2S2)2j  ~  E  H 2]  •  E  [si]  ->  00 
for  any  number  of  observations  T.  So,  essentially,  the 
reason  we  can  use  L2  analysis  of  the  Taylor  expansion 
of  Tf.?!  —  S12S2)  about  si  and  neglect  higher  powers  of 
£i2s2>  is  the  presence  of  the  restricting  constant  C  used 
in  RQML. 

Expanding  ^(si  —  £12^2)  up  to  first  order, 

*(«1  -  £1252)  «  *(«l)  -  ^'(Sl)£l2S2,  (8) 

and  substituting  into  (4a)  we  get: 

E  [{^(Sl)  -  ^,,(5l)£l2S2)  («2  —  £2l£l)- 

■I{\s2  -  £2lSl|  <  C)\  —  0.  (9) 

Using  condition  B  we  obtain 

E  [{*(Sl)  ~  ^',(Sl)£'l2S2}  (s2  —  £2lSl)/(|Sl|  <  Ci)- 

■I(\s2  -£21  Si  I  <  C)]  =  0  (10) 

It  is  again  relatively  straightforward  to  show,  under  the 
assumptions  of  ’’small  errors”  and  that  C  T§>  Ci,  that 
the  following  products  of  indicator  functions  are  nearly 
equivalent: 

I(\s i|  <  Ci)/(|s2  -  e2isi|  <  C)  »  J(|si|  <  Ci)/(|s2|  <  c) 

So  we  may  write 

■^[{^'(si)-^,/(si)£'12S2}(S2—£21Si)- 

•d(|si|  <  Ci)/(|s2|<  C)\  =  0.  (11) 

We  use  condition  B  again  to  obtain: 

E  [{$(Si)  -  ^,/(si)£l2S2}  («2  —  £2lSl)7(|s2|  <  C)] 
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=  E  [4r(ai)s2/(|s2|  <  C)}  -s2j  E  [^(s1)si/(|s2|  <  C)] 

' - „ - '  ' - - - ' 

Ai  A.2 

-suE['i’,(s1)s22I(\s2\<C)\ 

S  V  ^ 

^3 

+£l2?21  E  [^,(S1)S1S2/(|S2|  <  C)} 

s*—~ — — — — — ' / 

.44 

=  Ai  —  £2lA2  —  £12^3  +  £21£12^4  =  0  (12) 


It  can  be  further  shown  (using  the  i.i.cl.  assumption  and 
the  independence  of  the  sources),  that 

E  [A2]  =  i E  [*(Sl)2]  •  E  [s22I(\s2\  <  C)]  .  (17) 


Substituting  into  (16),  we  obtain: 

1  E[*(Sl)2} 


E[* 


12J 


T  £[^(si)]2  E[sp(\s2\<C)} 


(18) 


Note  that  Ai,  A2,  As,  A4  are  all  time-averages  of  some 
bounded  functions  of  random  variables.  As  such,  due 
to  the  i.i.d.  assumption,  they  converge  to  their  respec¬ 
tive  mean  values,  which  are  all  finite.  We  may  there¬ 
fore  neglect  the  term  A4  (which  converges  f?[A4]  =  0 
and  is  further  multiplied  by  Si2£2i).  A2,A; 3  converge  to 
some  nonzero  value  while  A4  converges  to  zero  (E[Ai\  — 
E  [*(si)s2/(M  <C)\=E  [#(*1)]  •  E  [s2/(|s2|  <  C)}  =  0 
because  s4,s2  are  independent  and  have  symmetric  den¬ 
sity  and  d>(x)  is  odd).  Finally,  replacing  A2,A3  with 
theirs  means,  (12)  becomes: 

Ai  —  £2iE[A2]  —  £i2E[A^\  =  0  (13) 


where 


E[A2]  =  E[*(Sl)Sl/(|s2|<C)] 

=  E[^(si)si]-P(|s2|  <C) 

and 

E  [A3]  =  E  [^(Sl)s27(|S2|  <  C)] 

=  E  [^'(si)]  •  E  [s2/(|s2|  <  Cj\ 


since  si,s2  are  independent.  P(|s2|  <  C)  is  the  prob¬ 
ability  that  |s2|  <  C.  Note  that  lime— 00  E  [A3]  =  00 
while  lime— 00  E  [A2]  =  £'[’F(si)si]  <  00.  So  increasing 
C  would  result  in  E  [A3]  E  [A2\  as  T  00  and  we  have 
Ai  -  £1 2E[A3]  w  0  or  equivalently 

_ Ai _ 

£l2%  E[^(Sl)]-E[s2/(|S2|<C)] 

substituting  the  definition  of  Ai  we  have 


£[jW(N^  n4x 

£l2  ~  E['F'(S1)]E[S2/(|S2|  <  C)\ 

Note  that  when  T  — >  00,  Ai  is  the  sum  of  infinitely  many 
i.i.d  random  variables  (with  finite  variance).  The  central 
limit  theorem  guarantees  that  £12  converges  in  distribu¬ 
tion  to  a  Normal  random  variable  with  mean  : 

E[£l2]  =  E[*'(si)]-£[slI(M  <C)\  =  °  (15) 


(since  E  [Aj]  =  0),  and  with  variance  : 

E  [A2] 


E[e 
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E[V(SlW-E[slI(\s2\<C)r 


(16) 


The  approximation  E  [A3]  3>  E  [A2]  was  used  under  the 
assumption  that  C  is  ’’large”  and  T  — +  00.  When  this  is 
not  the  case  (C  is  not  ’’large”  enough),  the  error  can  be 
found  by  solving  equation  (13)  together  with  (4b)  derived 
in  the  same  manner. 


IV.  Discussion 

Obviously,  the  RQML  estimate  is  a  sub-optimal  esti¬ 
mate,  since  the  Indicator  Functions  discard  occurrences 
of  ’’outliers”,  which  often  reflect  valuable  information. 
Its  only  advantage  lies  with  the  fact,  that  in  the  RQML 
framework  all  moments  exist,  and  therefore  a  ’’small  er¬ 
rors”  L2  performance  analysis  is  enabled.  Moreover,  the 
obt  ained  analytic  results  carry  implications  on  the  attain¬ 
able  performance  of  better,  non-RQML  (e.g.,  QML)  esti¬ 
mates.  They  allow  quantification  of  the  relative  effect  of 
the  separating  function  'I'(j')  on  the  RQML  performance; 
this  relative  effect  would  be  maintained  as  the  constant 
C  is  increased,  and  it  is  therefore  characteristic  of  QML 
as  well.  Furthermore,  interesting  implications  on  the  at¬ 
tainable  error  convergence  rate  (with  respect  to  the  ob¬ 
servation  length  T)  can  be  deduced. 

Specifically,  the  two  following  conclusions  can  be  drawn 
from  the  analysis: 

1.  The  asymptotic  performance  relates  to  the  nonlinear 

£['I'(a:)2l 

separating  function  ^(rr)  through  the  factor  § 

where  E[-]  denotes  the  expectation  with  respect  to  the 
true  distribution  of  the  corresponding  source  (note  that 
this  factor  is  common  to  some  other  estimation  problems 
using  QML  approach  [3]).  This  relation  provides  some 
insight  on  how  to  choose  the  separation  function  Tbc),  or 
equivalently,  predicts  the  performance  degradation  due  to 
the  use  of  suboptimal  separation  functions.  When  T(x) 
is  the  true  score  function,  that  is,  ff'(x)  =  (where 

f(x)  is  the  true  p.d.f.  of  the  corresponding  source),  per¬ 
formance  is  optimized. 

2.  The  variance  of  the  optimal  (non-RQML)  estimator’s 
error  must  converge  to  zero  faster  than  the  regular  rate  of 
1/r.  This  property,  sometimes  termed  ’’super-efficiency” 
(e.g.,  [5])  has  so  far  been  observed  (in  the  context  of  BSS) 
only  for  finite-support-  source  distributions  ([1]).  It-  can 
be  deduced  here  from  (18)  using  the  following  argument: 
Assume  that  an  optimal  estimator  exists,  whose  error- 
variance  decreases  as  1/T,  tending  (asymptotically)  to 
p/T  where  p  is  some  constant.  Now  from  (18),  the  RQML 
factor  multiplying  1  /T  can  be  arbitrarily  decreased  below 
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any  positive  p  by  increasing  C:  for  every  ’’large  enough” 
T0,  there  exists  some  Co  that  would  yield  a  smaller  vari¬ 
ance  than  p/T  for  all  T  >  Tq.  This  contradict  the  opti¬ 
mality  of  the  presumed  optimal  estimator.  We  therefore 
deduce,  that  the  optimal  estimator’s  error  variance  must 
decrease  at  a  faster  rate  than  1  /T. 

As  mentioned  earlier,  as  C  — *■  oo,  RQML  approaches 
QML.  However,  the  effective  value  of  C  is  closely  related 
to  the  observation  length  T.  With  C  fixed,  the  proba¬ 
bility  of  occurrence  of  an  outlier  (strong  enough  to  cause 
truncation)  obviously  increases  monotonically  with  T.  In 
other  words,  for  any  arbitrarily  large  C,  the  performance 
of  RQML  will  only  approach  that  of  QML  up  to  a  certain 
value  of  T,  and  would  then  remain  significantly  worse  as 
T  increases. 

In  order  to  asymptotically  improve  RQML’s  perfor¬ 
mance,  we  can  consider  increasing  C  as  the  observation 
time  T  increases  (hence  denoting  C  as  CT).  Note  that 
if  (18)  still  holds  (with  C  substituted  by  Ct),  then  the 
RQML  error  variance  would  decrease  faster  than  1/T. 
Thus,  a  key  question  here  is,  to  what  extent  we  can  in¬ 
crease  Ct  with  T,  while  maintaining  the  validity  of  (18). 

The  answer  to  this  question  naturally  depends  on  the 
sources’  distribution  (more  precisely,  on  the  decreasing 
rate  of  their  tails).  For  example,  we  prove  elsewhere  [4], 
that  when  the  sources  are  Symmetric  a-Stable  ( SaS )  sig¬ 
nals  with  parameter  a  (which  means  that  their  tails  de¬ 
crease  as  jjprr  for  ’’large”  |x|),  C’t  can  be  increased  as 

Ct  ~  T°~t  (where  6  is  some  arbitrarily  small  positive 
number).  Consequently,  the  error  variance  will  decrease 
at  least  as  fast  as  l/T%~s. 

The  following  figure  demonstrates  the  agreement  of 
some  simulation  results  with  our  theoretical  analysis.  We 
used  two  symmetric  a-stable  sources,  and  applied  the 
RQML  approach  using  cauchy’s  distribution  score  func¬ 
tion  as  the  separating  function  \I>(;r).  The  ’+’-s  indicate 
simulations  results  (as  a  function  of  T)  in  terms  of  the 
mean  square  error  of  s12.  The  solid  line  describes  the 
expected  performance  (18).  As  expected,  asymptotically 
the  simulations  results  agree  with  our  analytic  results. 

In  addition,  we  present  simulations  results  (’o’)  ob¬ 
tained  by  applying  the  QML  algorithm  (with  the  same 
T(x))  to  the  same  data.  The  dashed  line  indicates  a  slope 
corresponding  to  an  error  convergence  rate  of  l/T2/a, 
which  is  seen  to  fit  the  QML  simulations  results,  as  ex¬ 
pected  from  our  discussion  above.  The  vertical  position  of 
this  line  was  determined  manually,  since  we  do  not  have 
a  closed-form  expression  for  the  QML  performance. 

Note  also,  that  with  the  chosen  value  of  C,  the  RQML 
error  follows  the  QML  error  for  the  smaller  values  of  T, 
and  then  departs  to  values  that  are  significantly  worse 
than  QML.  By  using  an  increased  value  for  C,  the  de¬ 
parture  point  could  be  delayed  (in  T).  Naturally,  how¬ 
ever,  RQML  (with  C  fixed)  would  always  be  asymptoti¬ 
cally  worse  than  QML,  having  a  decrease  rate  of  1/T  vs. 


Fig.  1.  Simulation  results  for  two  symmetric  a  -  stable  sources 
with  parameters  a  =  1.2  and  a  =  1.2.  T  (,r)  is  score  function 
of  a  symmetric  Cauchy  (a  =  1)  distribution  with  <j  =  1.  The 
solid  line  represents  (18)  (for  C  =  120),  with  which  the  RQML 
simulations  results  (’+’)  are  seen  to  agree  asymptotically.  The 
QML  results  (’o’)  exhibit  super-efficiency,  with  the  predicted 
decrease  rate  of  1  /T2/Q.  Each  point  represents  the  average 
result  of  500  independent  experiments. 


V.  Conclusion 

We  addressed  the  performance  analysis  of  BSS  in  the 
context  of  ’’heavy-tailed”  signals.  When  these  signals  do 
not  have  finite  second-order  moments,  standard  analysis 
tools  (formerly  used,  e.g.,  to  analyze  the  QML  estimate) 
are  no  longer  useful. 

To  enable  L2  error  analysis,  we  introduced  the  RQML 
sub-optimal  estimate,  which  is  parameterized  by  a 
limiting-constant  C,  such  that  when  C  — >  oo,  RQML  ap¬ 
proaches  QML.  Using  ’’small  errors”  analysis,  we  obtained 
expressions  for  the  RQML  performance. 

Using  these  results,  we  concluded  that  the  optimal  esti¬ 
mator’s  performance  must  be  super-efficient,  in  the  sense 
that  its  mean  squared  error  must  decrease  (asymptoti¬ 
cally)  at  a  rate  faster  that  1/T.  More  specifically,  we 
demonstrated  that  for  symmetric  a-stable  sources,  the 
decrease  rate  of  QML  is  1  /T2/“ . 
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ABSTRACT 

In  our  earlier  work,  we  introduced  a  class  of  stochastic  processes 
obeying  a  structure  of  the  form,  £[.Y(t)A'(tA)]  =  R{  A),  t.  A  >  0, 
and  outlined  a  mathematical  framework  for  the  modeling  and  analy¬ 
sis  for  these  processes.  We  referred  to  this  class  of  processes  as  scale 
stationary  processes.  We  demonstrated  that  scale  stationarity  frame¬ 
work  leads  to  engineering  oriented  mathematical  tools  and  concepts, 
such  as  autocorrelation  and  spectral  density  function  and  finite  pa¬ 
rameter  ARMA  models  for  modeling  and  analysis  of  statistically 
self-similar  signals.  In  this  work,  we  will  introduce  a  state  space  rep¬ 
resentation  for  self-similar  signals  and  systems  based  on  scale  sta¬ 
tionary  ARMA  models.  Such  a  representation  provides  a  complete 
description  of  the  inner  and  outer  dynamics  of  a  self-similar  system 
or  signal  that  can  not  be  obtained  from  transfer  function  represen¬ 
tation.  We  will  introduce  Kalman  filtering  techniques  and  Ricatti 
Equations  for  smoothing  and  prediction  of  self-similar  processes. 

1.  INTRODUCTION 

1  //  processes  occur  in  a  broad  range  of  engineering  and  science 
applications  including  network  traffic,  noises  in  electronic  devices, 
biomedical  systems,  burst  error  in  communication  channels  to  men¬ 
tion  a  few  [l]-[9]. 

The  major  characteristics  of  these  processes  are  their  long  term 
correlation  structure,  and  their  statistical  self-similarity.  These  char¬ 
acterizations  are  apparent  in  the  empirical  1/f1  power  spectrum. 
Typically,  the  parameter  7  controls  both  the  degree  of  long  term 
correlations  and  the  statistical  self-similarity.  Mathematical  tools 
and  concepts  for  such  processes  were  first  formulated  and  advocated 
in  practice  by  Mandelbrot  [2]  within  the  context  of  “fractals”.  He 
proposed  the  well-known  fractional  Brownian  motion  (fBm)  model 
to  capture  the  long  term  correlation  and  statistical  self-similarity  of 
the  1/f  processes.  Given  the  elaborate  fBm  model,  and  the  aura 
of  “fractal  science”,  a  flurry  of  activity  evolved  around  the  model¬ 
ing  and  analysis  of  1/f  processes  in  engineering  literature  [3]-[6]. 
However,  these  efforts  never  hold  a  strong  ground  in  engineering  ap¬ 
plications,  mainly  due  to  the  mathematical  intractability  of  the  fBm 
model,  and  the  lack  of  foundational  principles.  In  [1],  Yazici  et  al. 
proposed  a  class  of  second  order  processes  obeying  a  structure  of 
the  form  E[X(t)X(t\)]  —  R{ A),  f,A  >  0  to  model  and  analyze 
1/f  processes.  These  models,  referred  to  as  scale  stationary,  enjoy 
theoretical  properties  parallel  to  the  ordinary  wide  sense  stationary 
processes.  Most  importantly,  their  foundation  is  based  on  the  exten¬ 
sions  of  the  concept  of  stationarity  on  which  powerful  time  series 
analysis  tools  are  derived.  Scale  stationary  processes  come  with  the 
spectral  analysis  tools,  and  ARMA  models  just  like  the  ordinary  sta¬ 
tionary  processes.  They  are  also  directly  linked  to  the  linear  scale 
invariant  systems.  Let  us  not  forget  to  mention  that,  fBm  model  is 
simply  a  trended  scale  stationary  model  with  stationary  increments. 
It  may  be  academically  dissapointing!  but  true  that  the  issue  of  "sta¬ 


tistical  self-similarity”  can  be  managed  to  a  large  degree  by  the  sim¬ 
ple  framework  of  "scale  stationarity"  . 

In  [1],  authors  introduced  scale  stationary  ARMA  models  based 
on  Euler-Cauchy  system  and  showed  that  any  scale  stationary  pro¬ 
cess  can  be  captured  by  a  finite  parameter  scale  stationary  autore¬ 
gressive  model.  In  this  study,  we  extend  the  ARMA  modeling  to 
multiple  input  and  multiple  output  (M1MO)  systems  and  propose 
a  state  space  representation  for  the  self-similar  processes.  At  first 
glance,  both  the  state  space  representation  and  the  Kalman  filter  may 
appear  simply  as  time-varying  models.  However,  with  the  proper 
definition  of  the  derivative  operation  on  the  multiplicative  group  and 
the  self-similarity,  both  the  state  space  model  and  the  Kalman  filter 
are  captured  with  constant  matrix  vector  representation.  This  new 
definition  of  the  derivative  operation  guides  the  implementation  of 
the  Kalman  filter  for  self-similar  processes,  both  in  recursive  update 
and  the  Ricatti  equation,  leading  to  superior  performance  than  the 
ordinary  time  varying  implementation. 

The  proposed  state  space  representation  and  the  Kalman  filter¬ 
ing  can  be  used  in  estimation,  and  prediction  tasks  involving  1/f 
phenomena.  Applications  include  inverse  filtering  for  communica¬ 
tion  channels  and  blurred  images  in  which  the  blur  or  the  channel  is 
time  varying  and  the  underlying  data  and  noise  have  1/f  character¬ 
istics.  Another  obvious  application  of  the  tool  is  in  communication 
network  traffic  prediction  which  has  potential  implications  in  net¬ 
work  management  and  quality  service  provisioning. 

The  organization  of  the  paper  is  as  follows:  Section  2  covers  the 
basic  background  on  scale  stationary  processes  and  scale  stationary 
ARMA  modeling.  Section  3  presents  the  state  space  representation 
and  the  derivative  operator  for  functions  defined  on  the  multiplica¬ 
tive  group.  Section  4  introduces  the  Kalman  filter  for  self-similar 
processes.  Section  5  discusses  the  implementation  of  he  Kalman  fil¬ 
ter  and  the  Ricatti  Equation  and  presents  some  simulation  results. 
Section  6  discusses  the  applications  of  the  proposed  Kalman  filter 
in  various  engineering  problems.  Finally,  Section  7  concludes  the 
discussion. 

2.  BACKGROUND  ON  SELF-SIMILAR  PROCESSES 


Before  giving  the  derivation  of  our  state  space  model  and  Kalman 
filtering,  we  like  to  summarize  related  background  information  on 
self-similar  processes  as  introduced  in  detail  in  [1],  A  linear  system 
satisfying 


S{x\t  A)}  =  A  y{t.\) 


0) 


is  called  a  Linear  Self-Similar  (LSS)  system  with  self-similarity  pa¬ 
rameter  H.  As  it  can  be  seen  from  this  definition,  analogous  to  LT1 
systems  which  are  invariant  to  time  shifts,  LSS  systems  are  invariant 
to  scale  changes  within  a  constant  parameter. 

The  output  of  the  LSS  system  to  any  input  is  found  by  a  scale 
convolution  operation  defined  as: 
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/°°  _  f 

h(—)u(\)dln\ ,  tO  (2) 

where  tHh(t)  is  the  response  of  the  system  to  the  unit  driving  force, 
S(t )  [1]  defined  as:  i)5(f)  =  0,  t  #  1  t  >  0,  ii)  /0°°  5{t/\)dln\  = 
1,  t  >  0,  iii)  x{t)  =  J0°°  x(\)S(j)dln\. 

A  linear  dynamical  model  for  LSS  systems  is  represented  as 
time  varying  Euler-Cauchy  type  differential  equations: 

aNtN  4 - h  +  aoy(i)  = 

jM  , 

bMtM+H  - — — u(f)  H - 1-  bit1+H  —u(t)  +  botHu(t)  (3) 

at  A-/  at 

This  type  of  system  satisfies  the  self-similarity  definition  as  in  (1). 
The  difference  of  the  Euler-Cauchy  system  actually  comes  from  the 
fact  that  the  dynamics  of  the  system  is  captured  in  scale  derivatives 
defined  on  the  multiplicative  group  [10]  as: 


_  lim  v(tA)  -y(t) 

dtVK)  A  ->  1  In  A 


(4) 


Since  the  model  is  invariant  to  scale  changes,  the  memory  of  the 
system  is  stored  in  infinitesimal  time  scalings,  similar  to  the  Euler 
dynamical  model  for  LTI  systems  where  the  memory  of  the  system 
is  stored  in  infinitesimal  time  shifts  since  these  systems  are  time 
invariant. 

In  a  probabilistic  setting  Euler-Cauchy  system  generates  self¬ 
similar  processes  with  1//  spectrum  [1],  Using  the  input-output 
relationship  of  the  LSS  system  (2),  the  power  spectrum  of  the  Euler- 
Cauchy  system  in  Fourier  domain  driven  by  a  white  noise  having 
autocorrelation  =  o2S(t2/ti)  is  shown  to  have  power- 

law  or  a  1//  spectrum  [1],  Therefore,  any  Iff  process  can  be  ap¬ 
proximated  with  a  finite  order  Euler-Cauchy  system  which  makes 
signal  processing  techniques  for  estimation  and  prediction  of  such 
processes  possible.  It  is  because  of  these  facts  that  we  use  Euler- 
Cauchy  systems  in  the  derivation  of  the  state  space  representation 
and  Kalman  filtering  algorithm  for  LSS  systems. 


3.  STATE  SPACE  REPRESENTATION  OF  SELF-SIMILAR 
PROCESSES 

Beginning  from  the  Euler-Cauchy  system  in  (3),  the  general  state 
space  representation  with  states  having  different  self-similarity  pa¬ 
rameters  can  be  obtained  as: 

t^x(t)  =  £H(A  +  H)rHx(f)  +  tHBu(t)  (5) 

y(f)  =  Cx(<)  +  Du(t)  (6) 

where  x(f)  =  [xi(f)  x2 (t)  ...  XN(t)]T  ([.]T  is  the  transpose  oper¬ 
ation)  is  the  Nx  1  state  vector,  u(f)  is  the  Rxl  input  vector,  y(f)  is 
the  Mx  1  output  vector,  A  is  a  NxN  matrix,  B  is  a  NxR  matrix,  C 
is  a  MxN  matrix,  D  is  a  MxR  matrix  and  H  is  a  NxN  diagonal 
matrix  having  values  Hi,H2,  ...,  Hn  in  its  diagonal  entries. 

In  this  representation,  the  self-similarity  parameters.  Hi  for  i  — 
1, ...,  N  of  the  states  can  be  equivalent,  then  the  external  system 
representation  reduces  to  the  Euler-Cauchy  system  in  (3).  However, 
for  the  states  to  have  same  self-similarities  is  not  vety  realistic  and 
it  is  a  specific  case  of  the  general  form.  Therefore,  we  use  the  most 
general  state  space  representation  with  the  states  having  different 
His  throughout  the  paper. 

It  can  be  argued  that  the  state  space  representation  for  LSS  sys¬ 
tems  can  be  expressed  as  first  order  time  varying  ordinary  differen¬ 
tial  equations  ~x(0  =  — — -x(t)  +  and  time 


varying  state  space  techniques  can  be  used  in  their  analysis.  In  this 
type  of  representation,  the  memory  of  the  states  are  captured  in  in¬ 
finitesimal  time  shifts  as  in  the  LTI  systems.  However,  here  for  the 
LSS  systems,  expressing  the  inner  dynamics  of  the  whole  system 
with  first  order  self-similar  Euler-Cauchy  systems  as  states  is  more 
appropriate  to  the  nature  of  the  dynamics  of  the  system.  This  is  be¬ 
cause  of  the  fact  that  the  states  are  also  self-similar  in  nature  there¬ 
fore,  their  energy  should  be  stored  in  infinitesimal  time  scalings  as 
in  (4). 

In  order  to  analyze  the  self-similar  dynamics  of  the  states  more 
closely,  let  us  consider  the  general  fcth  state,  Xk{t)  whose  dynamical 
equation  is:  w 

d 

t—Xk{t)  =  (ak,k  +  Hk)xk(t)  4-  tHk  ^ dkjt~Hlxi(t )  -f 

2  =  1 
l^k 

tHkBu(t)  (7) 

As  can  be  seen  from  this  equation,  the  dynamics  of  a  state  is  affected 
by  the  state  itself,  other  states  and  the  input  depending  on  the  A  and 
B  matrices.  The  dependency  on  the  state  itself  can  be  seen  as  an 
intrinsic  self-similarity  since  the  self-similarity  parameter  appears 
as  a  constant  gain  factor.  If  there  is  coupling  to  the  other  states, 
these  states  can  be  treated  as  inputs  where  the  self-similarity  of  the 
state  is  provided  with  the  fractional  or  self-similar  leakage  term  tHk . 
Note  here  that  the  self-similarity  of  the  coupled  states  xi  (f )  for  l  = 
1, ....  N  and  l  /  k  with  parameters  Hi  does  not  have  an  effect  on 
the  self-similarity  of  the  state  Xk(t)  which  guarantees  a  self-similar 
first  order  system  for  each  state  Xk(t)  with  only  one  self-similarity 
parameter,  Hk . 

The  solution  of  the  states  can  be  found  using  the  state  transition 
matrix  <f>(£,r)  as:  rt 

x(t)  =  #(£,£i)x(£i)  +  /  «F(f,  r)rHBu(r)dlnr  (8) 
Jti 

where  «h(£,  r)  can  be  obtained  using  the  fundamental  matrix  4>(f)  = 
£h£a  which  is  a  solution  of  the  homogeneous  state  equation  in  (5) 

351  ^(f,r)  =  #(f)#-1(r)  =  fH(i)A(r)-H  (9) 

T 

Note  here  that  the  state  transition  matrix  is  also  a  solution  of  the 
homogeneous  state  equation  and  it  satisfies  the  same  properties  as 
it  LTI  counterpart  [12].  The  unit  driving  force  response  is  found  as 
h(t,  t  )  =  <H(f)AB  using  (8).  Then  the  solution  for  the  states  and 
the  outputs  are:  rt  . 

x(t)  =  fHfAx(U)  +  iH  /  (— )ABu(r)dinr 

Jti  T 

y(f)  =  CfHfAx(f!)-|-CfH  /  (— )ABu(r)d/nr  (10) 

Jti  T 

Although  each  state  Xk(t)  for  k  =  1, ...,  N  is  self-similar  with  self¬ 
similarity  parameter  7/),  ,  depending  on  the  matrix  C,  the  outputs 
yj  (<)  for  j  =  1, ...,  M  can  be  expressed  with  either  one  self-similar 
state  or  a  linear  combination  of  self-similar  states  with  different  self¬ 
similarity  parameters. 

4.  KALMAN  FILTERING 

In  this  section  we  will  investigate  the  problem  of  estimating  the  state 
variables  of  a  self-similar  process  by  using  noisy  measurements  of 
the  linear  combination  of  the  states.  Consider  the  LSS  system  in 
state  space: 

=  A(f)x(f)  +  B(f)w(f)  (11) 

y(i)  =  Cx(t)  +  v(f)  (12) 
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where  A (t)  =  fH(A  +  H)f_H  andB(f)  =  fHB.  Equation  ( 1 1 )  is 
the  system  model  where  w(t)  is  the  system  noise  and  equation  (12) 
is  the  measurement  model  where  v(t)  is  the  measurement  noise. 
Both  w(t)  and  v(t)  are  zero  mean  white  Gaussian  noise  with  co- 
variances: 

£{w(()wT(r)}  =  Q  {t)S{t/T) 

£{v(f)vT(r)}  =  R(t)6(t/r)  (B) 

They  are  also  uncorrelated  with  each  other  and  the  states. 

Assuming  that  A (t),  B (t),  C  and  H  are  completely  known,  the 
state  estimate  x(f )  is  obtained  by  feeding  a  correction  term  back  to 
the  estimated  system  depending  on  the  difference  between  the  actual 
measurement  and  the  estimated  measurement  as: 

t£(t)  =  tH( A  +  H)rHi(f)  +  K(0(y(t)  -  Cx(0)  04) 

where  K(f)  is  the  Kalman  gain  matrix  that  has  to  be  estimated  opti¬ 
mally. 

Then  the  error  states  between  the  actual  and  estimated  states 
x(t)  =  x(f)  -  x(t)  satisfy: 

tk{t)  =  (A(f)  -  K(t)C)x(f)  +  K (t)v(t)  -  B(f)w(f)  (15) 

Using  the  solution  of  the  error  state  in  terms  of  its  state  transition 
matrix  #*(f,  r),  the  covariance  of  the  error  state  P(<)  is: 

P(f)  =  E{x(t)xT(r)}  =  S*(Mi)P(L)**(Mt)  + 

[  <M t,  r)K(r)R(r)Kr(r)$i(f.  r)dlnr  + 

Jt  i 

f  $x(f,r)B(r)Q(r)BT(r)$I(t,r)d/«r  (16) 

where  P(fi )  is  the  initial  error  covariance  matrix  at  initial  time  t\. 

For  the  estimated  state  to  be  optimal,  the  error  should  be  mini¬ 
mized  in  time  via  K(f).  In  order  to  find  the  optimum  Kalman  filter 
gain,  the  cost  function  J(t)  related  to  the  error  state  as: 

J(t)  =  E{kT(t)k(t)}  =  Trace{P(t)}  (17) 

should  be  minimized  in  the  MMSE  sense. 

After  some  manipulations  as  explained  in  [1 1]  the  Kalman  gain 
matrix  that  minimizes  the  cost  function  is  found  as: 

K(t)  =  PCTR_1(f)  (18) 

Then  using  this  K(t)  in  (16),  the  change  in  the  error  covariance  or 
the  Riccati  equation  can  be  obtained  as: 

tP(f)  =  A(t)P(t)  +  P(t)AT(f)  +  B(t)Q(t)BT(t)  - 

K(f)R(f)KT(t)  (19) 

Here,  the  Kalman  filtering  algorithm  has  the  same  structure  as 

its  LTI  counterpart.  The  major  difference  of  our  algorithm  from  the 
LTI  case  lies  in  the  state  update  (14)  and  error  covariance  propaga¬ 
tion  equations  ( 19).  Here,  the  memory  is  captured  in  infinitesimal 
time  scalings  instead  of  time  shiftings  as  opposed  to  the  LTI  case. 
Therefore,  the  self-similar  nature  of  the  state  estimate  and  error  co- 
variance  is  satisfied. 


5.  IMPLEMENTATION  AND  SIMULATION  RESULTS 

We  simulate  a  first  order  LSS  system  with  parameters  H  =  -0.2, 
.4  =  —0.1,  B  =  0.1,  C  =  1,  Q  =  1  to  test  the  performance  of  the 
proposed  Kalman  filter. 

We  generate  the  1//  data,  x(t)  via  a  covariance  method  that 
uses  Karhunen-Loeve  (KL)  transform.  The  autocovariance  of  x{t) 
for  the  first  order  system  given  above 

C„(fiuh)  =/3(tih)iA+H)(max(t1,h){-2A)  -  1);  (20) 

where  /}  =  B~Q/(—2A).  Then  using  this  covariance  matrix  and 
KL  transform  we  generate  the  Iff  data,  x(t)  for  1  <  f  <  20.  In  the 
Kalman  filtering  algorithm,  the  estimated  state  in  continuous  time  is 
approximated  using  the  scale  derivative  definition  (4)  in  geometric 
time  intervals: 

r(At)  =  x(t)  +  lnA(Ax(t)  +  K(t){y{t)  -  Cx(t)))  (21) 

and  the  Riccati  equation  solution: 

A(t)P(t)  +  P(t)AT(t)  +  B(f)Q(f)BT(<)  - 

K(f)R(()KT(f)  =  0  (22) 

is  obtained  using  the  Schur  algorithm  as  given  in  [13].  The  continu¬ 
ous  time  approximation  becomes  more  accurate  when  the  scale  step 
A  is  selected  as  close  to  1  as  possible.  Here,  in  our  application  we 
select  it  as  A  =  1.01. 

We  test  the  performance  of  the  Kalman  filter  for  two  different 
SArf?s  of  20,  and  lOdBs  using  100  Monte  Carlo  Runs  (MCR).  The 
SNR  of  the  signal  is  calculated  as: 

SNR  =  var(x)/vnr(v)  (23) 

A  sample  data  j-(f)  (solid  line),  y(t.)  (dash-dot  line)  and  the  esti¬ 
mated  data  r{t. )  (dashed  line)  for  SNR  —  20  and  lOdBs  out  of  100 
MCR  are  given  in  Figure  1  and  2.  respectively. 

Then  the  estimation  SNR'  for  each  estimated  signal  x (t)  is 
calculated  as: 

SNR'  =  var(x)/var(x  —  x)  (24) 

For  the  input  SNR  =  20  and  lOdB.  the  range  of  estimation 
SNR' s  for  100  MCR  in  each  case  are  found  as  7.73dB<  SNR'  < 
8.15dB  and  3.02dB<  SNR'  <  3.69dB  where  the  mean  values  of 
them  are  7.92dB  and  3.36db,  respectively. 

Let  us  mention  that  the  proposed  Kalman  filter  also  suffers  from 
the  same  problems  as  the  usual  Kalman  filter,  such  as  the  build¬ 
ing  up  of  the  “random  walk”  type  error  as  the  prediction  time  in¬ 
creases.  This  problem  can  be  overcome  with  the  usage  of  a  back¬ 
ward  smoother  if  the  offline  processing  is  possible. 

6.  APPLICATION  AREAS 

In  this  section,  we  will  explain  two  potential  application  areas  of  the 
proposed  Kalman  filtering  procedure  1 )  packet  arrival  estimation  in 
self-similar  network  traffic  and  2)  time  varying  fading  channel  esti¬ 
mation  during  self-similar  signal  transmission  in  wireless  commu¬ 
nication  applications. 

Network  traffic  studies  show  that  the  aggregate  of  the  packet  ar¬ 
rival  shows  the  same  statistics  of  long  range  correlations  which  de¬ 
cays  hyperbolically,  the  variance  of  the  sample  mean  decays  slowly 
and  their  power  spectrum  obey  power  law  near  the  origin  over  dif¬ 
ferent  time  scales.  This  observation  is  apparently  valid  for  Ether¬ 
net  traffic,  ISDN  packet  networks,  signaling  (CCSN/SS7)  networks 
for  public  telephone  networks,  [7,  8].  If  a:i,x-2,  X3,  ...  denote  the 
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number  of  arrivals  in  the  first,  second,...  interval,  the  aggregate 
of  these  arrivals  in  consecutive,  non-overlapping  block  of  m  in¬ 
tervals  are  calculated  as  follows:  Let  x ™  denote  the  mean  arrival 
rate  of  the  first  m  intervals  (xq  +  x2  +  ...  +  xm)/m,  x™  denote 
(xm(t._x)+1  +  ...  +  Xkm)/m  and  so  on.  Actually  these  aggre¬ 
gate  processes  give  the  whole  arrival  process  in  different  time  scales 
and  since  the  aggregate  arrival  process  shows  the  same  long  range 
statistics  with  slowly  decaying  variances  and  self  similar  character¬ 
istic,  it  is  a  self-similar  process  as  opposed  to  the  early  assumptions 
of  Poisson  distribution.  Therefore,  analysis  techniques  for  traffic 
density  must  consider  this  self-similar  nature.  Especially,  since  the 
buffering  requirements  for  self-similar  processes  are  larger  than  that 
are  estimated  with  Poisson  processes,  the  techniques  should  be  se¬ 
lected  carefully  for  the  estimation  of  buffer  size.  Using  the  proposed 
Kalman  filtering  technique,  the  self-similar  data  traffic  can  be  pre¬ 
dicted  recursively. 

Another  application  area  for  the  proposed  Kalman  filter  is  in 
communications.  In  present  wireless  communication  applications 
such  as  radar,  sonar,  acoustics,  etc.,  the  transmission  channel  is 
usually  modeled  as  a  multipath  fading  channel  having  slowly  time- 
varying  characteristics.  At  the  receiver  end  the  transmitted  signal 
through  a  multipath  fading  channel  is  further  corrupted  by  noise.  It 
is  an  important  and  a  difficult  task  to  deconvolve  the  original  signal 
from  this  received  data,  especially  when  the  transmitted  signal  and 
the  corruption  noise  are  nonstationary  or  1//  type.  To  the  best  of 
our  knowledge,  there  is  only  one  work  in  the  literature  [9]  that  solves 
this  problem  optimally  using  a  multiscale  Wiener  filter  in  wavelet 
domain.  As  an  alternative,  the  proposed  Kalman  filter  can  be  used 
for  the  estimation  and  prediction  of  the  transmitted  1//  signal  from 
the  observation  data  in  a  recursive  fashion  where  no  extra  steps  of 
wavelet  filtering  is  needed. 

7.  CONCLUSION 

In  this  paper,  we  have  developed  continuous  time  state  space  rep¬ 
resentation  and  an  optimal  state  estimation  algorithm  using  Kalman 
filtering  for  self-similar  processes.  Beginning  from  the  most  gen¬ 
eral  and  mathematically  tractable  dynamical  representation  such  as 
Euler-Cauchy  type  differential  equation  definition  of  1  //  processes, 
the  dynamics  of  the  states  are  represented  with  respect  to  the  mul¬ 
tiplicative  group  derivatives  where  the  memory  is  captured  in  in¬ 
finitesimal  scalings  of  time. 

Using  this  state  space  representation,  we  formulate  the  continu¬ 
ous  time  Kalman  filter  to  estimate  or  predict  the  self-similar  or  1// 
data.  Although  the  algorithm  appears  to  be  in  the  same  form  of 
LTT  systems,  the  major  difference  is  once  again  in  the  memory  con¬ 
tent  or  the  dynamics  of  the  estimated  state  and  the  error  covariance 
matrix  which  is  appropriate  to  capture  the  self-similar  nature  of  the 
statistics. 

This  work  can  be  extended  to  several  further  research  areas. 
Here  we  assumed  that  the  state  space  system  parameters  i.e.  A, 
B,  C  and  D  are  available.  However  this  may  not  be  possible  in 
some  real  time  applications,  such  as  in  network  traffic  or  in  fad¬ 
ing  channels  in  communication  networks.  Therefore,  a  generalized 
Kalman  filtering  technique  that  estimates  and  updates  the  unknown 
system  matrices  can  further  be  investigated.  This  framework  can  be 
extended  and  tested  for  2D  self-similar  signals  such  as  deblurring  of 
textured  images. 
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ABSTRACT 

In  this  paper,  we  present  a  Bayesian  approach  for  DOA 
and  frequency  estimation  of  narrow  band  signals  in  additive 
generalized  Gaussian  noise.  Using  Bayesian  techniques,  the 
posterior  probability  densities  for  DOA  (Direction  Of  Ar¬ 
rival)  and  frequency  parameters  are  derived  from  the  signal 
and  noise  models.  These  posterior  probabilities  are  then 
used  in  the  Metropolis-Hastings  (M-H)  algorithm  to  derive 
the  samples  for  the  DOA  and  frequency  parameters.  The 
performances  of  our  algorithms  are  studied  by  plotting  the 
MSEs  (Mean  Square  Errors)  of  the  parameters  for  various 
SNRs.  The  MSEs  of  the  parameters  arc  compared  with 
the  CRLBs  (Cramer  Rao  Lower  Bound)  for  the  generalized 
Gaussian  models. 

Keywords  :  Non-Gaussian  signal  processing,  Sensor  ar¬ 
ray  processing,  Bayesian  estimation,  M-H  algorithm. 

1.  INTRODUCTION 

Sensor  array  processing  has  found  important  applications  in 
many  areas  such  as  radar,  sonar,  communications  and  seis¬ 
mic  explorations.  Determination  of  DOAs  and  the  frequen¬ 
cies  of  the  transmitted  signals  are  two  of  the  main  problems 
in  sensor  array  processing.  Various  methods  have  been  pro¬ 
posed  for  estimating  DOA  and  associated  parameters  for 
multiple  plane-waves  signals  incident  on  array  of  sensors. 
These  include  subspace-based  methods  [5]  and  maximum 
likelihood  techniques  [4],  The  signal  processing  literature 
has  traditionally  been  dominated  by  Gaussian  noise  model 
assumptions.  However,  many  classes  of  noise  encountered 
in  the  real-world  such  as  underwater  acoustic  noise,  low  fre¬ 
quency  electro-magnetic  disturbances  [3]  and  atmospheric 
noise,  exhibit  outliers  that  will  not  fit  into  a  Gaussian  noise 
model. 

In  this  paper,  we  will  present  a  Bayesian  approach  to 
estimate  the  DOAs  and  frequencies  of  the  signals  in  gen¬ 
eralized  Gaussian  noise.  The  generalized  Gaussian  model 
has  been  applied  successfully  to  a  variety  of  physical  phe¬ 
nomena.  For  instance,  the  density  estimates  of  underwater 
acoustic  returns  of  surface  and  bottom  reverberation  bear  a 
strong  resemblance  to  the  members  of  the  generalized  Gaus¬ 
sian  family  with  a  wide  range  of  values  of  the  shape  param¬ 
eter  (or  decay  rate  parameter)  corresponding  to  heavy  as 
well  as  light-tailed  distributions. 


2.  PROBLEM  FORMULATION 

In  this  section,  we  define  the  signal,  noise  models  and  array 
structure.  We  assume  that  M  signal  sources  in  the  far  field 
transmitting  narrow  band  signals  (with  a  centre  frequency 
/o)  and  the  received  data  at  a  Uniform  Linear  Array  (ULA) 
of  sensors  are  corrupted  by  additive  noise.  The  ULA  has 
L  (>  M)  sensors  with  an  inter-distance  d  (<  A/2,  where 
A  is  the  wavelength  of  the  signal).  In  this  paper,  n  de¬ 
notes  normalized  time  (with  respect  to  the  sampling  inter¬ 
val  T»)  and  3?  denotes  the  real  part.  We  can  represent  the 
received  datax(j?)  of  M  signals  in  terms  of  their  steering 
matrix  A(0),  signal  vector  s(n)  and  noise  vector  v(?(.)  as 

x(n)  -  R{A(0)s(n)}  +  v(n)  for  n  =  1, 2, . . . ,  N  (1) 

where  x(n)  and  v(n)  are  Lx  1  vectors  denoting  the  received 
signals  and  the  generalized  Gaussian  noise  samples  at,  time 
n  respectively,  and  A(0)  is  a  LxM  matrix 

A(0)  =  [a(6>i ),  a(02),  •  •  • ,  b-{0m  )]  (2) 

where  a (0,)  is  a  Lxl  steering  vector  of  the  ith  signal  and 
given  by 

a (0i)  =  [1, exp(j0, ),..., exp(j(L  -  l)</>,)]'  (3) 

with 

<pi  =  2nfodsm(8i)/c.  for  i  =  1,  2, . . . ,  M  (4) 

In  the  above  equation,  c  and  6,  are  the  speed  of  wave  prop¬ 
agation  in  the  medium  and  DOA  of  the  ith  signal  respect, 
ively.  s(t<)  in  (1)  is  a  il/xl  vector 

s(n)  =  [si («.), . . .  ,SA/(n)]T;  (5) 

Taking  into  account  all  the  samples  in  x(n),  s(n)  and  v(n) 
for  n  =  1, . . . ,  N,  we  can  modify  (1)  as 

X  =  R{A(0)S}  +  V  (6) 

where  X  and  V  are  LxN  matrices 

X  =  [x(l), . . .  ,x(AT)];  (7) 

V  =  [v(l),  •  •  • ,  v(AT)];  (8) 


0-7803-701 1-2/01/S10.00  ©2001  IEEE 


86 


and  S  is  a  M  x  N  matrix 


(9) 


Substituting  (11),  (12),p(<r)  and  p(a)  into  (13),  after  some 
manipulation,  we  obtain 


S  =  [s(l),...,s(JV)]; 

A  generalized  Gaussian  pdf  is  given  [2]  by 

1  /II  v  —  (1 


p(v)  = 


2oT(l  +  1  /a)B(a) 


(\  |  v 

expU“- 


oB(a ) 


(10) 


where  B(a)  =  [^(l/a)/^(3/a)]1/,2.  In  this  model  p,  a  (> 
0)  and  a  (>  0)  denote  the  mean,  variance  and  decay  rate  of 
the  density  function,  respectively.  Smaller  values  of  a  cor¬ 
respond  to  heavier-tailed  distributions,  which,  in  turn,  are 
indicative  of  impulsive  noise  environments.  In  this  paper, 
we  use  only  zero  mean  generalized  Gaussian  noise  models. 
Assuming  that  the  noise  samples  axe  statistically  indepen¬ 
dent  from  one  another  both  along  the  array  sensors  (spa¬ 
tially)  and  along  time  (temporally),  a  likelihood  expression 
for  the  received  data  follows  from  (6)  and  (10) 


Plgg(X  |  0,  f,  cr,  a,  M)  =  (2oT(l  +  1  /a)B(a)) 

L  N 

x  n  n  exp 


(-NL) 


1=1  n= 1 


X(l,  w)  —  :)S(:,  n)} 

crB(a ) 


n 


(ii) 


3.  DERIVATION  OF  BAYESIAN  ESTIMATORS 
3.1.  Priors 

Assigning  various  priors  for  signal  and  noise  parameters,  we 
use  Bayesian  principles  to  define  a  posterior  density.  Each 
DOA  is  assumed  to  be  uniformly  distributed  between  0  and 
7T.  As  we  are  dealing  with  narrow  band  signals  with  fnw 
bandwidth,  we  assume  that  the  frequency  of  each  signal  is 
also  uniformly  distributed  in  the  interval  [f0  -  f0  + 

-^F].  Thus,  the  priors  for  the  DOA  vector  0  and  frequency 
vector  f  are  defined  by 


P( f  I  M)  = 


(12) 


A  non-informative  Jeffreys’  prior  p(cr)  =  A  and  a  uniform 
prior  p( a)  =  |  are  assigned  for  the  parameters  a  and  a 
respectively. 


3.2.  Posterior  Density  Derivation 

When  the  noise  is  model  led  by  a  generalized  Gaussian  pdf, 
a  posterior  density  for  the  unknown  parameters  can  be  ob¬ 
tained  from  Bayes’  theorem  as 

Pgg(6,  f,  cr,  a  |  X,  M)  ocplgg(X  |  0,  f,  a,  a,  M)p(0  \  M ) 
x  p{ f  |  M)p(cr)p{a) 

(13) 


Pgg(0,  f,  a,  a  |  X,  M)  oc  a~^L+1)  (2r(l  +  l/a)B(Q))(-ivi) 
x  exp  (a~a  ff-  [|  X(/,n)  —  J?{A(l,:)S(:,n)} 

V  hh  v  fi(«) 


*(!>"(  r-)"l 

7 r  ibw  2 

(14) 

The  noise  parameter  a  can  be  integrated  out  from  (14)  as 

BOO 

PGG(0,f,  a  |  X,  M)  oc  /  pgg{0,  f,  ff,  a  \  X,  M)  da  (15) 
Jo 

We  can  analytically  perform  this  integration  using  the  gamma 
integral.  Thus,  the  marginalized  posterior  density  is  given 
by 


pGG(0,f,a  |  X,M)oc  (2T(l  +  l/a)B{a))^NL)r{-NL/a) 

f  L  N  r,  .  - - ,,  ,,n\  ~NL/a 

EE- 


1=1  n=l 


X(/,n)-3ft{A(/,:)S(:,n)} 

B(q) 


7T  IBW 


(16) 


4.  MCMC  ALGORITHMS  FOR  PARAMETER 
ESTIMATION 


We  use  an  M-H  algorithm  [1]  to  estimate  the  parameters 
from  the  posterior  density  pgg-  We  implement  the  algo¬ 
rithm  in  three  M-H  steps  as  shown  below: 

•  Initialization:  Assign  initial  values  to  the  parameters: 

0°,  f°,  G°. 

•  Iteration:  for  i  =  1  to  itemax 

1)  Update  the  frequency  vector: 

Perform  a  M-H  step  with  pGG(f  |  0,_1,aI_1,X,  M) 
as  the  invariant  density.  Sample  fn  ~  A f(.  \  f(r'  ,  <j'j) 
and  accept  with  probability: 


=  min 


{> 


PGG(fn  \ 

Pgg (f 1  |  0i_1,ai-1,X,M)  J 


(17) 


2)  Update  the  DOA  vector: 

Perform  a  M-H  step  with  pgg(0"  ]  f\a!-1,X,  M)  as 
the  invariant  density.  Sample  0n  ~  J\f(.  \  0!_1,cr|) 
and  accept  with  probability: 


Pb 


—  min 


PGG(0"  ir.q^SX.M)  ) 

Pgg(0!_1  |  f Sa1-1,  X,M)  J 


(18) 


3)  Update  the  noise  parameter  or. 

Perform  M-H  step  with  pGG(0n  \  f’,a,_1,X,  M)  as 
the  invariant  density.  Sample  an  ~  B(.  \  qi-1,(t2) 
and  accept  with  probability: 


Pc 


=  min 


pGG(an  I  r,0',X,M) 
Pgg(q'_1  |  f!  ,  01,  X,  M)  a”  J 


(19) 


87 


•  end  iteration. 


In  the  above  M-H  algorithm  A'(.)  denotes  the  normal 
distribution.  At  the  i"‘  iteration,  the  pdf  corresponding  to 
the  Rice  distribution  1Z{.)  is  defined  by 


Pn{z) 


r/o 


+  (a,~'): 
2«r= 


>  0 

(20) 


where  the  random  variable  z=\/a2  +  bJ.  The  random  vari¬ 
able  a  is  normal  with  mean  a1-1  and  a ^  variance,  and  b  is 
normal  with  zero  mean  and  ai  variance.  /o(.)  in  the  above 
pdf  is  the  modified  Bessel  function.  When  o'~ 1  =  0,  the 
above  pdf  is  equivalent  to  a  Rayleigh  density. 

Multivariate  normal  distributions  are  used  as  the  pro¬ 
posal  distributions  for  updating  frequencies  and  DOAs.  The 
mean  values  of  these  normal  densities  are  the  correspond¬ 
ing  DOA  and  frequency  estimates  at  the  previous  iteration. 
The  invariant/target  density  at  each  M-H  step  can  be  for¬ 
mulated  from  the  posterior  density  pc- 

Similarly,  we  use  normal  distributions  as  the  proposal 
distributions  for  DOAs  and  frequencies  in  the  generalized 
Gaussian  noise  case.  The  noise  parameter  o  is  sampled 
from  7 Z(.).  The  M-H  algorithm  for  the  generalized  Gaus¬ 
sian  noise  case  is  similar  to  the  above  algorithm  with  minor 
differences.  In  the  above  M-H  algorithms,  aj,  cr|  and  rr(; 
are  the  variances  of  the  corresponding  proposal  distribu¬ 
tions.  The  constant  itemar  denotes  the  maximum  number 
of  iterations.  One  must  carefully  choose  these  variances  to 
obtain  an  algorithm  with  good  mixing  properties. 


5.  SIMULATIONS  AND  DISCUSSION 

To  test  the  performance  of  the  Bayesian  estimators  in  gen¬ 
eralized  Gaussian  noise  models,  several  experiments  were 
designed.  As  the  first  sensor  of  the  array  was  used  as  the 
reference  element,  the  phases  of  the  received  data  at  the 
first  sensor  did  not  depend  on  the  DOAs  of  the  transmitted 
signals.  Hence,  the  data  at  the  first  sensor  was  used  to  ob¬ 
tain  initial  estimates  for  the  frequencies  of  the  transmitted 
signals.  This  helped  to  ensure  a  fast  convergence.  In  most 
of  the  experiments  discussed  below,  the  estimate  of  an  un¬ 
known  parameter  was  obtained  by  taking  the  mean  of  the 
corresponding  samples  after  the  M-H  algorithm  converged. 
Then,  the  MSE  of  an  unknown  parameter  was  computed  In¬ 
forming  the  mean  of  the  MSEs  from  50  Monte  Carlo  runs. 
In  the  following  experiments,  M  =  2  (exponential  signals), 
d  =  1  m,  c  =  300  m/s  N  —  64  and  the  sampling  frequency 
fs  =  200  Hz. 

5.1.  Performance  and  Convergence  Properties  of 
the  Generalized  Gaussian  Estimator 

In  order  to  study  the  convergence  properties  of  our  algo¬ 
rithm,  we  used  our  algorithm  to  estimate  the  unknown  pa¬ 
rameters  at  smaller  values  of  a  corresponding  to  heavier- 
tailed  distributions.  In  this  experiment,  L  =  8  and  the 
DOAs  and  the  frequencies  of  the  signals  are  [30°  40°]  and 
[47  Hz  52  Hz]  respectively.  The  number  of  iterations  ( itemnT ) 
was  3,000.  Fixing  the  value  of  a  at  0.5,  the  signal  and  noise 
parameters  were  evaluated  at  two  different  values  of  a:  0.8 


i M 

- 

(a)  (b) 


Figure  1:  Evolution  of  DOAs  with  iteration  number:  (a)  a 
=  0.8,  (b)  o  =  1.3 


Figure  2:  Histograms  of  Frequencies:  (a)  o  =  0.8  (Is' 
source),  (b)  a  —  1.3  {2nd  source),  (c)  a  =  0.8  (Is'  source), 
(d)  o  =  1.3  {2nd  source) 


and  1.3.  Figure  1  shows  the  evolution  of  both  DOAs  with 
iteration  number.  We  can  see  from  these  figures  that  the 
M-H  sampler  converged  to  the  target  DOA  values,  30°  and 
40°,  within  200  iterations.  At  the  1500"'  iteration  we  re¬ 
duced  the  values  of  the  variances  of  the  proposal  densities  of 
DOAs.  Consequently,  the  mixing  properties  of  our  sampler 
changed  after  the  1500"'  iteration  (see  figure  1).  The  his¬ 
tograms  of  the  frequencies  and  o  are  given  in  figures  2  and 
3  respectively.  As  expected,  the  samples  are  centred  around 
the  target  frequencies,  47  Hz  and  52  Hz,  and  the  target  a, 
0.8  and  1.3.  To  study  the  performance  of  the  Bayesian  es¬ 
timator  in  generalized  Gaussian  noise,  the  value  of  <r  was 
varied  from  0.1  to  1  while  fixing  the  value  of  a  at  1.5.  For 
each  a,  the  number  of  iterations  {itemar)  was  3,000.  The 
MSEs  of  0 ,  f  and  a  were  computed  from  50  Monte  Carlo 
runs.  The  figures  4(a)  and  4(b)  show  the  MSEs  and  CRLBs 
of  the  DOAs  and  frequencies  respectively.  As  expected,  the 
MSE  decreased  with  increasing  SNR  and  approached  the 
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CRLB  at  high  SNRs. 


(a) 


(b) 


Figure  3:  Histograms  of  a:  (a)  a  =  0.8,  (b)  a  =  1.3 


MSEs  of  the  DOAs  decreased  with  increasing  array  size.  It 
is  also  obvious  from  these  figures  that  the  resolution  of  the 
DOAs  was  affected  more  by  the  amount  of  noise  than  the 
array  size  (when  L  >  7). 


Figure  4:  MSEs  of  the  (a)  DOAs,  (b)  frequencies  in  gener¬ 
alized  Gaussian  noise 


5.2.  Resolution  Capabilities  of  the  Estimators 

In  this  section  we  analyzed  the  resolution  capabilities  of 
our  generalized  Gaussian  estimator.  We  used  two  exponen¬ 
tial  signals  whose  frequencies  were  50  Hz  and  assumed  to 
be  known.  The  data  length  ( N ),  inter-sensor  spacing  (d). 
speed  of  the  waveform  (c)  and  sampling  frequency  ( fs )  were 
as  same  as  in  the  last  experiment. 

In  the  first  experiment  the  DOA  of  the  second  signal  was 
varied  while  the  DOA  of  the  first  signal  was  kept  constant 
at  10°.  The  number  of  sensors  used  in  this  experiment  was 

5.  We  did  50  Monte  Carlo  runs  for  each  DOA  of  the  second 
signal,  running  the  M-H  algorithm  for  3,000  iterations.  The 
MSEs  of  both  DOAs  were  estimated  using  the  generalized 
Gaussian  estimators  and  the  results  are  plotted  against  the 
angular  separation  in  figure  5(a).  As  expected,  the  MSEs  of 
both  DOAs  decreased  with  increasing  angular  separation. 
However,  these  figures  show  that  for  well-separated  DOAs 
(>  8°),  the  resolution  of  the  DOAs  was  only  limited  by  the 
amount  of  noise. 

In  the  second  experiment  we  studied  how  the  MSEs 
of  the  DOAs  were  affected  by  the  array  size  (number  of 
sensors).  The  DOAs  of  the  signals  were  10°  and  15°.  We 
did  50  Monte  Carlo  runs  for  each  array  size,  running  the  M- 
H  algorithm  for  3,000  iterations.  The  MSEs  of  the  DOAs 
were  estimated  using  the  generalized  Gaussian  estimators 
and  the  results  are  displayed  in  figure  5(b).  As  expected  the 


Figure  5:  MSEs  of  the  DOAs  vs  (a)  angular  separation,  (b) 
array  size 


6.  CONCLUSIONS 

In  this  chapter  we  developed  a  Bayesian  estimator  for  a 
generalized  Gaussian  noise  model.  We  have  shown  that 
the  MSEs  of  the  estimates  approached  the  CRLBs  at  high 
SNRs.  Our  simulation  results  demonstrated  that  our  algo¬ 
rithm  converged  within  200  iterations. 

Studying  the  resolution  capabilities  of  our  Bayesian  es¬ 
timator,  we  showed  that  the  MSEs  of  the  DOAs  decreased 
with  increasing  angular  separation  and  array  size.  How¬ 
ever,  our  simulations  showed  that  for  well-separated  DOAs 
(>  8°)  or  for  a  large  array  size  (L  >  7),  the  resolution  of 
the  DOAs  was  only  limited  by  the  amount  of  noise. 
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ABSTRACT 

Inspired  by  robust  estimation,  nonlinear  denoising  methods  com¬ 
bining  the  mean,  the  median,  and  the  LogCauchy  filters  are  pro¬ 
posed.  Some  statistical  and  asymptotic  properties  are  studied,  and 
comparisons  with  other  nonlinear  filtering  schemes  are  performed. 
Experimental  results  showing  a  much  improved  performance  of 
the  proposed  filters  in  the  presence  of  Gaussian  and  heavy-tailed 
noise  are  analyzed  and  illustrated. 

1.  INTRODUCTION 

A  variety  of  models  have  been  sources  in  modeling  impulsive  noise 
inluding  the  Laplacian  model  whose  distribution  has  heavier  tails 
than  the  Gaussian.  Examples  of  impulsive  noise  include  atmo¬ 
spheric  noise,  cellular  communication,  underwater  acoustics,  and 
moving  traffic.  Recently,  it  has  been  shown  that  a-stable  (0  < 
a  <  2)  distributions  can  approximate  impulsive  noise  more  accu¬ 
rately  that  other  models  [1].  The  parameter  a  controls  the  degree 
of  impulsiveness  (heaviness  of  the  tails),  and  the  impulsiveness  in¬ 
creases  as  q  decreases.  The  Gaussian  ( a  =  2)  and  the  Cauchy 
(a  =  1)  distributions  are  the  only  symmetric  a-stable  distributions 
which  have  closed-form  probability  density  functions.  The  two 
most  important  properties  of  a-stable  distributions  are  the  stability 
property  and  the  Generalized  Central  Limit  Theorem  [1]. 

It  is  also  known  that  in  the  presence  of  only  Gaussian  noise, 
the  efficiency  of  a  median  filter  leaves  room  for  much  improve¬ 
ment  relative  to  that  of  a  mean  filter  [2].  This  led  to  a  number  of 
other  proposed  nonlinear  schemes  to  attain  a  balance  between  the 
two.  Among  these  proposed  filters,  figure  Wilcoxon  and  Hodges- 
Lehmann  filters  [2]. 

Approaches  to  wavelet-based  denoising  have  generally  relied 
on  the  assumption  on  Gaussian  noise,  and  are  therefore  sensitive 
to  outliers,  i.e„  to  noise  distributions  whose  tails  are  heavier  than 
the  Gaussian  distribution,  such  as  Laplacian  distribution.  For  in¬ 
dependent  e-contaminated  Gaussian  distributions  of  the  wavelet 
coefficients,  Krim  and  Schick  [4]  derive  a  robust  estimator  of  the 
wavelet  coefficients  based  on  minimax  description  length. 

In  the  next  section,  we  provide  a  brief  review  of  Huber  min¬ 
imax  approach,  some  basic  sliding  window  filters  and  symmetric 
a-stable  ( SaS )  distributions.  In  Section  3,  a  nonlinear  filtering 
structure  called  Mean-Median  filter  is  introduced  and  its  asymp¬ 
totic  analysis  is  performed.  Section  4  is  devoted  to  another  class 
of  nonlinear  denoising  techniques  called  Mean-LogCauchy  filters. 

This  work  was  supported  by  an  AFOSR  grant  F49620-98-1-0190  and 
by  ONR-MURI  grant  JHU-72798-S2  and  by  NCSU  School  of  Engineer¬ 
ing. 


Finally,  in  Section  5.  we  provide  experimental  results  to  show 
a  much  improved  performance  of  the  proposed  filters  at  remov¬ 
ing  noise  from  images  corrupted  by  e-contaminated  Gaussian  and 
heavy  tailed  noise,  while  preserving  well  image  structures. 

2.  BACKGROUND 

Consider  the  additive  noise  model 

xi  =  si  +  vi,  ie  zm,  (l) 

where  { }  be  a  discrete  m-dimensional  deterministic  sequence 
corrupted  by  the  zero-mean  noise  sequence  {V^},  and  { A'^}  is  the 
observed  sequence.  The  objective  is  to  estimate  the  sequence  Sj 
based  on  a  filtering  output  —  iF(Xj),  where  T  is  a  filtering 
operator. 

Here,  we  assume  that  the  noise  probability  distribution  is  a 
scaled  version  of  a  known  member  of  the  family  of  e-contaminated 
normal  neighborhood  proposed  by  Huber  [3] 

V<  =  {(1  -  e)$  +  eH  :  H  G  5}, 

where  'f  is  the  standard  normal  distribution,  S  is  the  set  of  all 
probability  distributions  symmetric  with  respect  to  the  origin  (i.e. 
such  that  H{-x)  =  1  -  H(x))  and  e  G  [0, 1]  is  the  known  frac¬ 
tion  of  “contamination”.  The  presence  of  outliers  in  a  nominally 
normal  sample  can  be  modeled  here  by  a  distribution  H  with  tails 
heavier  than  normal.  Note  that  symmetry  ensures  the  unbiased¬ 
ness  of  the  maximum  likelihood  estimator,  making  the  expression 
for  its  asymptotic  variance  considerably  simpler.  Krim  and  Schick 
[4]  proposed  a  robust  wavelet  thresholding  technique  based  on  the 
minimax  description  length  (MMDL)  principle,  determining  the 
least  favorable  distribution  in  Vc  family  as  the  member  that  maxi¬ 
mizes  the  entropy.  The  MMDL  approach  results  in  a  thresholding 
scheme  that  is  resistant  to  heavy-tailed  noise. 

Let  IK  be  a  sliding  window  of  size  2N  +  1.  Define  = 
{A i+r  :  r  £  W]  to  be  the  window  centered  at  location  i.  The 
output  of  the  mean  filter  is  given  by 

}  i  =  Wi  =  argmm  ]T  (Xi+r  -  0)2.  (2) 

rg  w 

where  Wj  is  the  sample  mean  of  the  window  Wp 

Denote  by  \Wj\  the  fc-th  order  statistic  of  the  samples  in  ILj, 

that  is  [Wi](1)  <  [Wi)m  <  <  [W*](2JV+1)- 

The  output  of  the  standard  median  (SM)  filter  is  given  by 

Yi  =  [WiltN+n  -  J2  I Xi+r  -  e\-  <3) 

rew 
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Such  estimators  are  well  founded  and  well  known  for  a  Gaussian 
and  Laplacian  distributions.  Note  that  the  mean  and  median  filters 
are  the  maximum  likelihood  estimators  of  the  location  parameter 
for  the  Gaussian  and  Laplacian  distributions,  respectively. 

The  general  class  of  a-stable  distributions  has  also  been  shown 
to  accurately  model  heavy-tailed  noise  [1].  A  symmetric  a-stable 
( SaS )  random  variable  is  however  only  described  by  its  charac¬ 
teristic  function 


at  all  points  x  where  the  limit  exists,  and  Ax  stands  for  delta  dis¬ 
tribution  function,  i.e.  with  unit  mass  at  x.  The  influence  function 
gives  the  effect  of  an  infinitesimal  perturbation  to  the  data  at  the 
point  x. 

It  can  be  shown  that  the  influence  function  of  the  mean  and  the 
median  filters  are  given  by  [3] 

IF(x-,Wi,Fe)  =  x-6, 


ip(t)  =  exp(jOt  —  7|f|“), 

where  j  G  C  is  the  imaginary  unit,  9  G  R  is  the  location  parameter 
(centrality),  7  G  R  is  the  dispersion  of  the  distribution  and  a  G 
(0,  2]  which  controls  the  heaviness  of  the  tails,  is  the  characteristic 
exponent  [1]. 

When  a  G  (0, 2),  an  SaS  random  variable  has  infinite  variance, 
and  the  Cauchy  (a  =  1)  is  the  only  distribution  which  has  a  closed- 
form  for  the  probability  density  function.  This  is  in  fact  useful 
when  using  the  principle  of  maximum  likelihood  estimation. 

The  LogCauchy  (LC7)  filter  [5]  is  the  maximum  log-likelihood 
estimator  of  the  location  parameter  for  a  Cauchy  density,  and  yields 
the  following 

=  LC7 ( Wf )  =  argmin log  (7 2  +  {Xi+r  -  9)2)  ,  (4) 

rew 

where  7  is  the  dispersion,  and  9  is  the  estimation  parameter. 

3.  THE  MEAN-MEDIAN  FILTER 


and 

Then  it  follows  that  the  influence  function  of  the  MEM  filter  is 
given  by 

IF(x]  MEM,  Fe)  =  {  1  -  A)(s  -  9)  +  ~  9)  ■  (6) 

2/(0) 

Using  (5)  and  (6),  the  following  result  holds. 

Proposition  1  The  asymptotic  variance  V'fMEM,  Fg)  of  the  MEM 
filter  at  the  distribution  F 

U(MEM,  Fe)  =  (1  -  A) V2  +  ^  +  A(1  -  A )j±-y  (7) 
where  pk  =  E|A  -  9\k ,  k  =  1,  2. 

Remark:  While  the  independence  assumption  of  the  filter  input 
simplifies  the  tractability  of  the  problem,  it  is  not  strictly  valid. 


From  Eqs.  (2)  and  (3),  it  can  easily  be  seen  that  the  mean  filter  is 
optimal  for  Gaussian  noise  in  the  sense  of  mean  square  error  while 
the  standard  median  filter  for  Laplacian  noise  in  the  sense  of  mean 
absolute  error.  Assume  that  the  noise  probability  distribution  P  is 
a  scaled  version  of  a  member  of  Ve,  i.e.  P  —  (1  —  e)G  +  eL, 
where  G  is  Gaussian  Af( 0,  <7q)  with  variance  erg,  and  L  is  Lapla¬ 
cian  (or  double-exponential)  £(0,  <r\)  with  variance  of,  (clearly 
L  G  S).  This  assumption  on  the  noise  to  be  e-contaminated  Gaus¬ 
sian  and  Laplacian  distributed  is  motivated  by  the  fact  that  heavier 
tails  than  the  Gaussian  mixture  are  provided  by  the  Laplace  distri¬ 
bution,  which  is  used  as  a  contaminant  of  the  Gaussian  distribu¬ 
tion.  A  convex  combination  of  the  mean  and  the  median  filters  can 
be  defined  as  follows. 


Definition  1  The  output  of  the  Mean-Median  (MEM )  filter  is  given 
by  _ 

Yt  =  (1  -  A)J^  +  A[Wi](JV+1),  A  G  [0, 1]. 

As  a  suitable  performance  measure  for  a  robust  estimator,  Hu¬ 
ber  suggests  its  asymptotic  variance  since  the  sample  variance  is 
strongly  dependent  on  the  tails  of  the  distribution.  Indeed,  for  any 
estimator  whose  value  is  always  contained  within  the  convex  hull 
of  the  observations,  the  supremum  of  its  actual  variance  is  infinite. 
For  this  and  other  reasons,  the  performance  of  the  mean-median 
filter  is  carried  out  using  its  asymptotic  variance. 

The  asymptotic  variance  V(T,F)  of  an  estimator  T  at  the  dis¬ 
tribution  F  is  then  given  by  [3] 

V(T,  F)  =  J  IF(x ;  T ,  F)2dF(x),  (5) 

where  IF(x ;  T,  F)  is  the  influence  function  of  T  at  F  defined  as 


IF(x;T,  F)  =  lim 


T((l-t)F  +  tAx)-T(F) 
t 


Minimizing  (7)  over  A,  we  obtain  the  minimum  attainable  asymp¬ 
totic  variance,  and  the  filter  attaining  that  minimum  asymptotic 
variance  will  then  provide  the  best  filtering  performance. 

Corollary  1  The  minimum  value  of  V (MEM,  F'e )  is  attained  at 
Amin  given  by 


Example:  If  the  input  is  i.i.d.  AT (9,  a2),  then  using  (8),  we  obtain 

Amin  2/(2  +  7r). 

4.  MEAN-LOGCAUCHY  FILTERS 

The  LogCauchy  filter  has  been  shown  to  outperform  the  standard 
median  filter  in  removing  highly  a-stable  noise  [5],  then  the  MEM 
filter  can  be  improved  replacing  the  median  by  the  LogCauchy, 
and  therefore  a  new  class  of  nonlinear  filters  is  derived. 

Now  we  assume  that  the  noise  probability  distribution  P  is  a 
scaled  version  of  a  member  of  Pe  such  that  P  —  (1  —  e)G  + 
eS,  where  G  is  Gaussian  AT (0,  <j2g )  and  S  is  SaS  with  location 
parameter  9  and  dispersion  7 s-  The  parameter  a  controls  how 
impulsive  the  distribution  is. 

Suppose  that  G  and  S  are  the  cumulative  distribution  functions 
of  two  independent  random  variables  Xg  and  Xs  respectively, 
then  the  characteristic  function  ip c  of  the  random  variable  (1  — 
c)Xg  +  tXs  is  given  by 

=  exp  (je9t  -  (1  -  ef^t2  -  e“7s|f|“^  ,  e  G  [0, 1] 

For  a  G  (1,  2],  all  SaS  random  variables  have  finite  mean 
given  by  their  location  parameter  9.  Moreover,  it  is  shown  in  [6] 
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that  an  SaS  distribution  with  zero  mean  can  be  approximated  by 
a  finite-Gaussian  mixture.  Assuming  that  S  is  zero  mean  SaS 
(1  <  a  <  2),  then  P  =  (1  -  e)G  +  eS  can  be  approximated  by 
a  finite-Gaussian  mixture,  and  hence  the  noise  model  ( 1 )  becomes 
an  e-contaminated  Gaussian  mixture  noise  model. 

For  a  G  (0, 1],  all  SaS  random  variables  have  a  median  and 
the  only  SaS  distribution  having  closed-form  probability  density 
function  is  Cauchy  distribution  (a  =  1),  thus  the  maximum  log- 
likelihood  principle  can  be  applied  to  derive  (4).  A  convex  combi¬ 
nation  of  the  mean  and  the  LogCauchy  filters  can  then  be  defined 
as  follows. 

Definition  2  The  output  of  Mean-LogCauchy  (MLC -,)  filter  with 
parameter  7  is  given  by 

Yj  =  MLC7(W^)  =  (l-AJH^+ALC,^),  A  G  [0,1],  (9) 
where  7  is  the  dispersion  of  a  Cauchy  distribution. 


Since 

log  (!  +  =  exp  {(AVr  ~  ’ 

and  the  exponentionl  function  exp{  }  is  monotonically  increasing, 
it  follows  that 

LC -A  argmin  ^  (AT i+r  -  9)1  as  7  -A  00. 
re  tv 

This  concludes  the  proof  using  (2)  and  (9).  H 

Note  that  asymptotically,  the  tuning  parameter  7  transforms  a 
nonlinear  filter  to  a  linear  one. 

5.  EXPERIMENTAL  RESULTS 


The  output  of  the  LogCauchy  filter  is  defined  as  a  solution  of  the 
following  maximum  log-likelihood  estimation  problem 

9j  =  argmaxL,(0;  Wj) 

=  “E”“loen,i(A+(vL-*)5)'"01 

where  f7(0;  Wj)  is  the  log-likelihood  function  of  a  Cauchy  distri¬ 
bution  C  (7, 6). 

It  is  clear  that  for  a  given  7,  solving  ( 10)  is  equivalent  to  min¬ 
imizing  the  function  p-,(9\  Wj )  given  by 

Py(0\Wi)=  n  (72  +  (^i+r-e)2)>  <n> 

rew 

as  well  as  to  solving  the  problem  (4)  since  the  log(-)  function 
is  strictly  monotone.  Thus  the  minimum  of  (4)  is  attained  at  the 
same  place  as  that  of  p1{6\  Wj).  This  is  very  important  because 
p7(0;  Wj)  is  a  polynomial  of  degree  2(2Ar+l)  in  6  and  its  charac¬ 
teristics  can  then  be  obtained  easily.  It  can  be  shown  that  p7(0;  Wj) 
is  a  convex  function  of  6  if  7  >  \WA(2N+i)  —  [U’jla),  and 
therefore  has  a  unique  minimum  6q  G  [[W^lti))  [Wf](2jv+i)]-  At 
7  =  0,  the  function  p7(0;  Wj )  has  distinct  minima  at  all  the  points 
Xj+r.  If  7  is  increased,  the  number  of  minima  decreases.  After  a 
certain  limit  of  7,  there  is  only  a  unique  minimum. 

Proposition  2  When  7  -A  00,  the  Mean-LogCauchy  filter  be¬ 
comes  the  mean  filter,  i.e. 

MLC7(W^)  -A  Wj  as  7  -A  00. 

Proof.  Using  basic  properties  of  the  argmin  function,  the  output 
of  the  LogCauchy  filter  can  be  expressed  as 

LC  .{Wj)  = 


argmin  ^  log  ( 7 2  +  (Ai+r  -  0)2) 


rew 

argmin  £  72  log  (  1  + 
rew 


(Xj+r  -  ey 


argmin  !°g  I  1  + 


rew 


(Xj+r-e)2'1 


This  section  presents  simulation  results  where  the  proposed  filters 
are  applied  to  enhance  images  corrupted  by  mixed  Gaussian  and 
heavy  tailed  noise.  The  performance  of  a  filter  clearly  depends 
on  the  filter  type  and  its  sliding  window  size,  the  properties  of  sig¬ 
nals/images.  and  the  characteristics  of  the  noise.  The  choice  of  cri¬ 
teria  by  which  to  measure  the  performance  of  a  filter  presents  cer¬ 
tain  difficulties.  In  particular,  it  is  clear  that  a  global  performance 
measure  such  as  the  mean  square  error  only  gives  a  partial  picture 
of  reality:  for  instance,  one  filter  may  do  vary  well  at  the  nominal 
model  but  badly  at  an  outlier,  while  another  do  poorly  at  the  nom¬ 
inal  model  but  well  at  an  outlier,  and  yet  the  two  could  have  the 
same  mean  square  value.  Another  important  performance  measure 
in  the  mean  absolute  error  which  is  obviously  tend  to  downplay  the 
influence  of  large  errors,  compared  to  mean  square  error  precisely 
in  the  presence  of  heavy-tailed  noise. 

Mean  square  error  (MSE)  between  the  filtered  and  the  original 
image  is  evaluated  to  quantitatively  compare  the  good  performance 
of  the  proposed  filters  with  other  filtering  techniques. 

The  scale-contaminated  Gaussian  and  Laplace  distributions  are 
relatively  light  tailed.  The  SaS  distributions  are  very  heavy-tailed 
noise  distributions.  The  Cauchy  distribution  is  a  member  of  this 
family  (a  =  1),  whose  variance  is  infinite.  To  assess  the  per¬ 
formance  of  Mean-LogCauchy  filters  in  mixed  noise,  the  origi¬ 
nal  image  in  Fig.  1(a)  was  contaminated  by  both  Gaussian  white 
noise  (cr2  =  100)  and  a-stable  noise  SaS(a  =  0.5).  The  e- 
contaminated  mixed  noise  corrupted  image  is  shown  in  Fig.  1(b). 
The  visual  comparison  with  other  techniques  is  shown  in  Fig.  1. 
The  relaxed  median  filter  [7]  outperforms  Wilcoxon  and  Hodges- 
Lehmann  in  suppressing  highly  a-stable  noise,  while  the  Mean- 
LogCauchy  filter,  with  mixture  parameter  A  =  tt/(2  -f  n)  and  op¬ 
timal  tuning  parameter  7  =  2.38,  achieves  the  best  performance. 
In  the  simulation  results  of  Fig.l,  the  contamination  fraction  e  is 
chosen  to  be  equal  to  A. 

The  high  sensitivity  of  many  specific  filters  to  an  accurate 
modeling  of  noise  that  is  to  be  removed  led  us  to  investigate  the 
proposed  new  techniques  that  include  a  number  of  filters  whose 
optimality  when  given  a  specific  noise  distribution  is  attained  by 
merely  adjusting  or  optimizing  the  parameter  A.  On  the  other  hand, 
the  filtering  performance  is  also  sensitive  to  the  fraction  of  con¬ 
tamination  e.  When  e  =  0  the  mixed  noise  is  purely  Gaussian,  and 
when  e  =  1  it  is  purely  a-stable.  Fig.  2  shows  the  influence  of  the 
parameter  e  on  the  filtering  performance. 
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Fig.  1.  Filtering  results  in  the  presence  of  e-contaminated  Gaussian  and  a-stable  noise,  and  using  a  3  x  3  square  window:  (a)  Original 
image,  (b)  e-mixed  noisy  image  with  Af(0, 100)  and  SaS,  (c)  Output  of  the  MLC  filter,  A  =  2/(2  +  tt),  (d)  Output  of  the  relaxed  median 
filter,  (e)  Output  of  the  Wilcoxon  filter,  and  (f)  Output  of  the  Hodges-Lehmann  filter. 


Fig.  2.  Influence  of  the  contamination  fraction  e  on  filtering  per¬ 
formance:  MSE  vs.  e. 
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ABSTRACT 

A  new  mixed-cost  receiver  for  direct-sequence  code-division 
multiple  access  (DS-CDMA)  systems  is  proposed.  An  adap¬ 
tive  mixing  function  is  introduced  to  combine  the  constrained 
minimum  output  energy  (CMOE)  and  constant  modulus  (CM) 
criteria  together.  Simulations  confirm  the  near-far  resistance 
of  the  proposed  receiver  over  a  wide  range  of  near-far  situ¬ 
ations. 

1.  INTRODUCTION 

The  constrained  minimum  output  energy  (CMOE)  criterion 
[1,  2]  is  widely  known  as  an  effective  interference  cancella¬ 
tion  scheme  for  code-division  multiple  access  (CDMA)  sys¬ 
tems.  This  feature  is  emphasised  when  the  channel  exhibits 
a  near-far  environment:  the  situation  when  one  or  more  in¬ 
terfering  users  have  greater  power  than  the  desired  user.  The 
performance  of  the  CMOE  receiver  degrades,  however,  in 
high  signal  to  noise  ratio  (SNR)  situations  and  by  distortion 
of  the  received  signals  due  to  multipath  fading  channels  [1], 
In  [2],  the  constraint  proposed  in  [1]  is  replaced  by  a  code 
constraint  matrix  to  retain  the  output  energy  of  the  desired 
user  at  a  particular  path  delay.  Although  this  new  scheme 
prevents  the  cancellation  of  the  desired  signal  and  sidesteps 
the  use  of  an  explicit  constraint  on  the  orthogonal  vector  [  1  ], 
the  performance  of  the  CMOE  receiver  still  degrades  either 
in  the  case  of  low  interference  power  or  when  the  number 
of  multipaths  is  extended  [3]. 

The  constant  modulus  algorithm  (CMA)  receiver  per¬ 
forms  better  in  (inverse)  channel  estimation  and  provides 
near- Weiner  receiver  performance  [4].  However,  since  its 
cost  surface  is  multimodal,  the  CM  criterion  possibly  pos¬ 
sesses  some  undesirable  local  minima  which  in  some  cases 
associate  to  the  solutions  of  interfering  users.  Good  initial¬ 
isation  for  a  CMA  receiver  can  help  evade  these  local  min¬ 
ima  and  accelerates  the  convergence  of  the  algorithm.  In  se- 
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vere  near-far  environments,  a  pre-whitening  process  of  the 
received  signal  is  indispensable  despite  its  excessive  com¬ 
putational  complexity  [5], 

This  paper  concerns  the  exploitation  of  the  salient  fea¬ 
tures  of  both  criteria  to  produce  a  near-far  resistant  receiver 
which  can  be  operated  in  multipath  fading  channels  with 
a  wide-range  of  near-far  levels.  The  proposed  algorithm 
jointly  updates  the  receiver  weight  vector  by  adaptively  min¬ 
imising  a  mixed-cost  function.  The  mixing  parameter  is  also 
adapted  according  to  the  near-far  level.  Simulation  results 
are  provided  to  show  the  signal  to  interference  plus  noise  ra¬ 
tio  (SINR)  performance  of  the  proposed  combining  scheme 
compared  to  those  of  the  existing  algorithms.  It  is  shown 
that  the  mixed-cost  scheme  is  superior  in  terms  of  SINR  lev¬ 
els  over  a  wide-range  of  near-far  levels  in  multipath  fading 
channels. 


2.  SYSTEM  MODEL 

For  the  real  system  model,  the  baseband  received  signal  for 
a  A'-user  asynchronous  CDMA  channel  is  defined  as 

OG  K 

r(t)  =  Y  YAkbk®Ck(t-iT~Tk)+v(t')'  ^ 

i— — oo  A—  1 

where  .4*  represents  the  received  amplitude  of  the  A  th  user. 
The  data  bits  b*.  [i]  are  independent  identically  distributed 
(i.i.d.)  and  assumed  to  be  drawn  from  the  finite  alphabet 
{  —  1,  +1}.  The  symbol  period  is  denoted  by  T.  The  spread¬ 
ing  (or  signature)  waveform  of  the  A-th  user  c*(f)  is  Ac- 
dimensional  and  has  unit  energy  property,  i.e.,  ||c/t||2  =  1 
and  rk  G  [0,  T)  are  the  relative  offsets  of  the  asynchronous 
signals  at  the  receiver.  The  zero-mean  additive  white  Gaus¬ 
sian  channel  noise  v{t.)  has  constant  power  spectral  density 
a1 .  If  we  incorporate  the  amplitude  Ak  and  delay  r>  in  the 
channel  response  hk(t),  we  can  replace  the  spreading  code 
sequence  c(t )  with  the  discrete-time  combined  channel-code 
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response 


lc-  1 

9k[l]  =  ck\i]hk[l  -  i],  (2) 

i= 0 

where  c*  [i]  is  the  ith  element  of  the  code  vector  for  the  kth 
user  Cfc  =  (c*; [0] , . .  .,c[Lc  -  1])T. 

The  continuous-time  received  signal  r{t)  is  sampled  to 
form  a  length- L /  received  signal  vector  at  the  nth  observa¬ 
tion,  where  Lf  is  the  length  of  a  receiver  for  the  fcth  user 
with  tap- weight  vector  4, 

r[n]  =  ( r[nN  +  Lf  -  1],  -  -  -  ,r[nN])T.  (3) 

The  received  signal  vector  r(f)  in  (1)  can  then  be  formulated 
in  the  matrix- vector  form  as 

K  K 

vk  w + vM  =  Gkhk  w + vM>  (4> 

k= 1  fc=l 

where  G*  is  the  combined  code-channel  response  matrix  of 
the  Mi  user  and  bft[n]  =  (bk[n  +  Lb  - 1],  •  •  •  ,bk[n])T  with 
Lb  =  \Lf+^h~l]  and v[n]  =  (v[nN]+Lf-l],  ■  ■  •  ,  u[nAf])T. 
Note  that 

Gfc  =  C/tHfc,  (5) 

where  the  Lf  x  LbLh  code  matrix  C*  and  the  LbLh  x  Lb 
channel  matrix  Ht  are  defined  as 


tcb[Lc  1] 

0 

(hk  ^  A 

h»,  0 

c*  [0] 

ck[Lc  -  1] 

,  nk  = 

0 

^  0 

c*[0]  J 

V  hi,/ 

where  the  channel  response  vector  for  the  fcth  user  has 
length  Lh,  i.e.,  h/;  =  (h{Lb  -  1], ... ,  fi[0])T.  For  brevity, 
we  shall  consider  the  first  user  as  the  desired  user  and  drop 
the  subscript  k  in  all  variables  involving  the  first  user. 

3.  MIXED-COST  ALGORITHM 

Consider  a  combined  cost  function 

J(f,A)  =  AJ(f)  +  (l-A)J(f), 
where  J(f)  =  £{fTRf}  (6) 

j(f)=£{((fTr)2-l)2} 

are  the  CMOE  [2]  and  the  CM  [6]  cost  functions  respec¬ 
tively  and  A  £  [0, 1]  is  the  mixing  parameter  and  R  = 
£{rrT}.  The  CMOE  criterion  [2]  is  given  by 

min  £{f 7  Rf  }  subject  to  fTC  =  1, 


where  1  =  (1, 0,  ■  •  •  ,  0)T  since  the  first  path  is  assumed  to 
be  the  dominant  path  and  the  gradient  of  the  CMOE  cost  is 
given  by  [2] 


dJ(  f) 

df 


f=f[n] 


=  ^Wncr[n]. 


(7) 


where  z[n]  =  fT[n]r[n]  is  the  output  of  the  receiver  and 
IIq  =  I  -  C(CTC)-1CT  denotes  the  projection  matrix 
onto  the  nullspace  of  C.  The  CMA  receivers,  i.e.,  the  lo¬ 
cations  in  receiver  parameter  space  of  the  local  minima  of 

J (f),  are  found  by  means  of  the  CMA  algorithm  [6]  which 
searches  adaptively  for  the  zero  of  the  gradient 


dm 

dt 


f=f[n] 


=  z[n]{z2[n\  -  l)r[n]. 


(8) 


For  the  derivation  of  a  CMA  receiver,  two  important  points 
need  to  be  mentioned.  First,  it  should  be  noted  that  the  ini¬ 
tialisation  of  a  CMA  receiver  is  crucial  for  the  convergence 
to  the  solution  of  a  desired  user.  In  [5],  a  timing  acqui¬ 
sition  scheme  of  a  desired  user  is  proposed  in  order  to  be 
used  as  an  initialisation  of  a  CMA  equaliser.  Collectively, 
this  acquisition-equalisation  process  is  called  the  minimum- 
entropy  CMA  (ME-CMA)  receiver  [5].  When  the  received 
signals  are  not  pre-whitened  in  the  equalisation  process,  the 
ME-CMA  receiver  is  essentially  a  conventional  CMA  re¬ 
ceiver. 

Second,  the  CMA  receiver  is  shown  to  converge  faster 
if  a  constraint  is  imposed  on  the  received  signals  as  shown 
in  [7],  However,  it  can  be  shown  that  the  CMA  receiver 
still  converges  to  a  desired  solution  without  the  requirement 
of  any  constraint  as  long  as  an  appropriate  initialisation  is 
used  [5].  In  the  derivation  of  the  mixed-cost  receiver,  there¬ 
fore,  we  do  not  impose  any  constraint  upon  the  constant 
modulus  derivative. 


3.1.  Weight  vector  update  equation 

Following  the  derivation  of  the  algorithms  for  both  criteria, 
the  update  equation  of  the  mixed  cost  CMOE-CMA  receiver 
weight  vector  f  [n]  is  given  by 


f  [n  +  1]  =  f[n]  -  p 


dJ{  f ,  A) 


df 


=  f[n]  -  /i^A;r[n]n 


f=f[n] 

cr[n] 


(9) 


13(1  -  A )z[n}(z2[n]  -  l)r[n]), 


where  p  is  the  stepsize  of  the  mixed-cost  algorithm.  For 
best  operation  of  this  algorithm,  it  is  necessary  to  weight 
the  constant  modulus  derivative  which  in  effect  modifies  the 
mixture  in  (6)  and  can  be  explained  by  the  nonhomogeneity 
of  the  two  costs.  This  is  realised  by  introducing  the  scaling 
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factor  /?  for  the  constant  modulus  derivative  in  (9).  Note  that 
for  the  case  of  A  =  1,  equation  (9)  is  the  update  equation 
of  the  CMOE  receiver  [2]  and  when  A  =  0,  the  mixed-cost 
receiver  is  essentially  the  CM  A  receiver  [6], 


3.2.  Update  equations  for  the  mixing  parameter 


The  main  objective  of  the  derivation  of  the  mixed-cost  al¬ 
gorithm  is  to  jointly  exploit  the  features  of  the  two  criteria 
in  various  near-far  environments.  Therefore,  the  mixing  pa¬ 
rameter  A  is  replaced  by  the  time-varying  version  A [7?]  in 
order  to  track  the  variation  of  the  channel. 

We  adopt  the  multi-step  method  as  described  in  [8]  for 
the  update  of  the  mixing  parameter  A[n]  in  a  similar  man¬ 
ner  as  for  adapting  the  gain  in  the  adaptive  gain  algorithm. 
The  adaptation  of  the  mixing  parameter  A[n]  is  obtained  by 
applying  a  second  LMS-type  algorithm  to  adaptively  min¬ 
imise  J(f,  A)  with  respect  to  A.  The  stochastic  gradient  up¬ 
date  equation  for  A  [n]  is  given  by 


A [n  +  1]  =  A[n]  -  a 


dJ(  f,  A) 


d\ 


A=A[)i]J  A_ 


A[n]  -  a(z[n]2  -  (z[n]2  -  l)2 
+  {2A[n]z[?z]rT[n] 

+  4(1  -  A[n])(z[n]2  -  l)z[n]r7  [n]}  4  [?)])]  ^ , 


(10) 


where  a  is  the  adaptation  rate  and  [  ]^  denotes  truncation 
to  the  limits  of  the  range  [A_,  A+]  and  0  <  A_  <  A+.  '4'  [t?] 
represents  the  derivative  di [n]/G>A|A=A[n]-  From  (9),  the  up¬ 
date  equation  of  4  [n]  is  given  by 


d>[n  +  1]  =  I  -  /r(A[n]IIcr[n]r[n] 


-  /?(1  -  A[?i])(3z2[n]  -  l)r[n]r[n]T )j  (f>[n] 

-  /t(z[n]n£r[n]  -  P{z2[n]  -  l)z[n]r[n]j. 

(ID 


Equation  (9)  together  with  (10)  and  (11)  constitute  the  new 
mixed-cost  CMOE-CMA  algorithm.  The  structure  of  the 
proposed  receiver  is  shown  in  Fig.  1 . 


3.3.  Computational  complexity  and  convergence  prop¬ 
erties 

The  computational  complexity  of  the  algorithm  is  L  j  + 
161//  +  12  and  L2  +  9L/  in  terms  of  multiplications  and 
additions.  Since  global  convergence  property  of  the  CMOE 
has  been  given  in  [2]  and  local  convergence  of  CMA  has 
been  shown  in  [4],  with  careful  choice  of  //  and  A[n],  the 
combined  algorithm  should  demonstrate  at  least  similar  con¬ 
vergence  properties. 


Fig.  1.  The  mixed-cost  CMOE-CMA  receiver. 


4.  SIMULATIONS 

We  considered  a  symbol-asynchronous  system  with  process¬ 
ing  gain  Lc  =  31  and  number  of  users  K  =  7.  User  delays 
were  uniformly  distributed  over  [0, 7 Tc)  and  then  kept 
fixed.  The  propagation  channels  are  bandlimited  with  root- 
raised-cosine  pulse  shaping  with  excess  bandwidth  0.2.  The 
number  of  multipath  rays  was  three,  where  the  last  two  rays 
were  uniformly  distributed  in  delay  over  [0, 7 Tc).  The  chan¬ 
nel  length  for  all  users  was  lOT,..  We  assumed  without  loss 
of  generality  that  the  first  user  is  the  user  of  interest  with 
unity  power.  The  timing  of  the  first  user  was  assumed  to 
be  known.  The  background  noise  was  zero-mean  AWGN 
with  SNR=20  dB  (referenced  to  the  first  user).  Each  re¬ 
ceiver  was  length-21, c.  The  initial  value  of  A[n]  was  set  to 
unity  and  A_  and  A+  were  0  and  1  respectively.  The  adap¬ 
tation  rate  o  was  5  x  10~4  and  \P[0]  was  0.1[1,  ■  ■  •  ,  1] 7 . 
The  performance  measure  was  the  averaged  SINR  in  dB  and 
all  SINR  plots  were  averaged  over  100  Monte-Carlo  runs. 
We  compared  the  performances  of  the  CMA  receiver,  the 
CMOE  receiver  [2]  and  the  proposed  mixed-cost  CMOE- 
CMA  receiver.  We  tested  the  performances  of  the  three  re¬ 
ceivers  in  various  settings  of  the  near-far  situations  which 
can  be  quantified  in  terms  of  the  near-far  ratio  (NFR)  where 
NFR  =  101og]0  ,Vfc  =  {2,...,  7}.  Fig.  2  (a)  and  (b) 
show  the  averaged  SINR  plots  of  the  three  receivers  at  NFR= 
14  dB  and  26  dB  respectively.  It  is  observed  that  the  perfor¬ 
mance  of  the  CMA  receiver  is  degraded  in  high  NFR  cases 
because  the  attraction  basin  of  the  desired  user  is  likely  to 
reduce  in  dimension  as  the  NFR  increases.  The  CMOE 
receiver  reveals  the  characteristic  of  near-far  resistance  as 
shown  in  Fig.  2  (b)  but  inferior  to  the  CMA  receiver  in 
the  low  NFR  cases  as  in  Fig.  2  (a).  In  both  cases,  the 
mixed-cost  receiver  is  superior  to  the  two  existing  receivers. 
The  steady-state  averaged  SINR  plots  at  various  NFRs  are 
shown  in  Fig.  3.  For  the  mixed-cost  receiver,  high  SINR 
levels  were  maintained  over  a  wide-range  of  near-far  levels 
confirming  its  near-far  resistance  characteristic.  Time  evo¬ 
lution  plots  of  A  [7?  ]  for  different  NFRs  are  shown  in  Fig.  4. 
Notice  that  the  relaxation  rate  is  varied  as  a  function  of  the 
NFR  settings.  For  low  NFRs,  A[n]  decays  quickly  to  zero 
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Fig.  2.  The  comparison  of  SINR  performances  of  three  re¬ 
ceivers  at  (a)  NFR=14  dB  and  (b)  NFR=26  dB. 


whereas  its  magnitude  is  sustained  at  high  levels  for  high 
NFRs. 


5.  CONCLUSION 

We  have  presented  a  new  mixed-cost  receiver  structure  for 
DS-CDMA  systems  based  on  the  CMOE  and  CM  crite¬ 
ria.  The  multi-step  method  is  exploited  in  the  derivation  of 
the  adaptive  mixing  parameter  algorithm.  Simulations  have 
confirmed  that  the  averaged  SINR  performance  of  the  pro¬ 
posed  mixed-cost  algorithm  in  various  near-far  situations  is 
superior  to  the  existing  algorithms.  On-going  research  is  fo¬ 
cused  upon  the  evaluation  of  this  method  in  the  presence  of 
time-varying  interference. 
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ABSTRACT 

CDMA  systems  need  simultaneous  multiple  access  inter¬ 
ference  suppression  and  adaptive  interference  suppression 
filters  that  may  span  three  symbols.  Thus,  a  large  number 
of  filter  coefficients  need  to  be  estimated.  By  the  use  of 
reduced  rank  filtering,  it  is  possible  to  lower  the  number 
of  required  filter  coefficients  with  a  small  decrease  in  per¬ 
formance.  In  [1],  a  successful  attempt  for  non-dispersive 
CDMA  signals  was  made  to  develop  a  reduced  rank  al¬ 
gorithm  based  on  the  multistage  Wiener  (MSW)  filter  of 
Goldstein  and  Reed  [2].  In  this  paper,  motivated  by  MSW 
we  propose  a  reduced  rank  decorrelating  R  AKE  receiver  for 
dispersive  CDMA  signals.  The  proposed  receiver  is  blindly 
implemented  in  a  lower  dimensional  space,  relative  to  the 
full-rank  receivers,  without  the  aid  of  training  sequences 
and  the  channel  information.  Bv  exploiting  the  structure 
of  the  user  signature  waveform,  the  proposed  receivers  ex¬ 
hibit  performance  close  to  that  of  the  reduced  rank  MMSE 
receiver  implemented  with  the  desired  user’s  known  channel 
information. 

1.  INTRODUCTION 

In  CDMA  systems  orthogonality  between  waveforms  may 
not  be  protected  because  of  the  random  delay  of  the  users 
and  the  frequency-selective  fading  environment.  It  is  well 
known  that  a  simple  matched  filter  suffers  from  increased 
cross  correlations  between  user’s  signature  waveforms  in 
fading  and  results  in  a  poor  performance. 

Multiuser  detection  provides  an  eminent  solution  to  re¬ 
instate  the  CDMA  systems’  performance  promised  by  the 
use  of  orthogonal  waveforms.  Most  existing  multiuser  de¬ 
tectors  such  as  the  linear  receiver  require  the  knowledge  of 
the  users’  (or  at  least  the  desired  user’s)  signature  wave¬ 
forms.  However,  signature  waveforms  vary  with  the  multi- 
path  channel  characteristics.  Although  the  signature  wave¬ 
form  can  be  estimated  periodically  using  training  sequences, 
this  may  not  be  affordable  in  a  fast  changing  wireless  prop¬ 
agation  environment. 

Blind  multiuser  receivers  based  on  subspace  decompo¬ 
sition  can  mitigate  possible  multipath  effects  and  channel 
dispersion  [3].  However,  these  receivers  require  heavy  com¬ 
putation  due  to  the  use  of  subspace  decomposition.  Mo¬ 
tivated  by  the  low  complexity  of  RAKE  receivers,  Liu  and 
Li  [4]  proposed  a  decorrelating  RAKE  receiver  that  provides 
a  performance  close  to  that  of  the  MMSE  receiver  with  the 
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desired  user’s  known  channel  information.  Decorrelating 
RAKE  and  MMSE  receivers  is  designed  to  suppress  mul¬ 
tiuser  interference,  therefore,  they  still  require  knowledge 
of  the  observed-data  covariance  matrix  inverse  to  estimate 
a  large  number  of  filter  coefficients. 

In  [1],  a  reduced  rank  minimum  mean  squared  error 
(MMSE)  receiver  based  on  the  multistage  Wiener  (MSW) 
filter  is  proposed  for  non-dispersive  CDMA  signals  in  order 
to  lower  the  number  of  required  filter  coefficients  especially 
for  the  cases  where  the  processing  gain  is  larger  than  the 
dimension  of  the  signal  subspace.  Reduced  rank  receivers 
are  concerned  with  reduction  in  dimensionality  of  the  ob¬ 
served  data.  Thus,  the  purpose  of  a  reduced  rank  receiver 
is  to  obtain  near  full-rank  performance  with  a  filter  order 
smaller  than  the  signal  subspace.  The  important  feature 
of  the  reduced  rank  MSW  filter  in  [1]  is  that  it  does  not 
rely  on  eigen  decomposition  and  hence  its  low  complexity. 
However,  the  algorithm  is  applicable  when  the  signature 
waveform  of  the  user  of  interest  is  available  to  the  receiver 
or  the  training  sequences  are  employed. 

In  this  paper,  we  propose  a  reduced  rank  MSW  decor¬ 
relating  RAKE  receiver  which  exploits  the  structure  of  the 
user’s  spreading  waveform  in  multipath.  The  performance 
of  the  proposed  RAKE  receiver  is  compared  by  computer 
simulations  to  the  eigen-based  cross-spectral  methods  of 
rank  reduction  and  the  full  rank  decorrelating  RAKE  re¬ 
ceiver.  The  simulation  results  demonstrate  that  the  pro¬ 
posed  method  can  outperform  eigen-based  methods  without 
utilizing  eigen  decomposition  and  provide  a  performance 
similar  to  reduced-rank  MSW  receiver  with  the  known  sig¬ 
nature  waveform. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2 
gives  a  description  of  the  signal  model.  Section  3  introduces 
the  reduced  rank  decorrelating  RAKE  receiver.  Section  4 
discusses  its  implementation  based  on  MSW.  Simulation 
results  and  conclusions  are  given  in  Section  5  and  6,  re¬ 
spectively. 

2.  DATA  MODEL 

We  consider  a  P-user  asynchronous  CDMA  (A-CDMA)  base¬ 
band  signal  sampled  at  the  chip  rate: 

P  OO 

x(l)  =  y;  y  Si(v)g,(l  —  nLc)  (1) 

1=1  71  =  —  OO 

where  {s;(ji)}  are  the  information  symbols  and  the  spread¬ 
ing  waveform  distorted  by  the  time  dispersive  channel  is 
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given  by 

L 

9i(l )  =  hi(k)a(l  -k  +  kj),  /  =  1,  •  •  • ,  2LC  (2) 

k= 1 

where  k.  is  the  chip  known  delay  index.  The  above  signature 
waveform  can  be  written  in  a  vector  form.  We  assume  that 
L  <  Lc  so  that  a  window  spanning  two  symbols  is  enough  to 
fully  observe  the  signature  waveform  regardless  of  the  delay 
index  value. The  vector  g,  =  C,h;  is  given  by  the  channel 
vector  hi  multiplied  by  2  Lc  x  L  Toeplitz  filtering  matrix 
constructed  from  the  code  a , 


0 

0 

d(  1) 

0 

Ci(l) 

Ci  (Lc) 

0 

Ci(LC/ 

0 

0 

It  is  seen  that  { g, }  are  uniquely  determined  by  the  un¬ 
known  channel  vectors  {hi}.  Using  MATLAB  notation, 
define 

gi{m)  =  gi((m-l)Lc  +  l:mLc),  m-  1,2.  (4) 

If  we  desire  the  first  user’s  signal  as  a  signal  of  interest, 
then,  stacking  data  samples  by  a  window  that  spans  two 
symbols  [3]  provides 


conjugate  transpose.  The  closed  form  optimal  solution  to 
the  problem  can  be  obtained  via  Lagrange  multipliers 

w  i,opt  =  RxiC(CHRxx  C)_1l(.  (9) 

This  solution  is  computationally  complex  in  the  sense  that 
it  requires  2 Lc  x  2 Lc  matrix  inversion.  In  order  to  sim¬ 
plify  the  computation  of  the  weight  vector  given  in  (9), 
the  projection  matrix  Pc  =  C(CHC)_1CH  can  be  used  to 
decompose  this  vector  into  two  orthogonal  components  as 
follows: 

W I  =  w CJ  -  M H-Wa,i  (10) 

where 

WCi;  =  PcW;,opi  —  C(CHC)_1li  (11) 

is  not  data  dependent  and  M  is  a  (2 Lc  —  L)  x  2 Lc  blocking 
matrix  which  is  chosen  to  satisfy  MC  =  0,  therefore  guar¬ 
antees  that  second  component  is  orthogonal  to  first  com¬ 
ponent  [5].  Note  that  vja.i  is  the  data  dependent  weight 
vector  and  the  size  of  w0i;  dominates  the  complexity  of  the 
receiver.  By  projecting  the  blocked  data  y(ra)  =  Mx(n) 
onto  a  lower  dimensional  subspace,  the  number  of  filter  taps 
needed  to  compute  w„i(  can  be  reduced.  As  shown  in  Fig¬ 
ure  1,  let  Q i  for  each  RAKE  finger  be  a  D  x  (2 Lc  —  L) 
matrix  where  D  <  (2 Lc  -  L).  In  this  case  the  rows  of  Q;  is 
a  basis  for  the  lower  dimensional  subspace.  The  new  opti¬ 
mization  problem  involving  the  new  reduced  rank  adaptive 
vector  becomes 

wr,(  =  arg min E{ \di (n)  -  w"  Q,Mx(n)|2}  (12) 

wr,/ 

where  di(n )  =  w",x(n)  and  in  turn  wa,;  is  computed  by 


x(n)  =  gisi(n) 

l  +  z(n) 

(5) 

w  0,(  =  Qf  w  r,t 

(13) 

where  z(n)  is  given  by 

the  solution  to  the  optimization  problem  in 

(12)  is  given  by 

'  gi (2)  0  [  si(n  -  1) 

0  giW  si(n  + 1) 

p 

+  ^  GiSj(n)  +  o (n) 

i=2 

(6) 

=  Rqjrg, 

where 

(14) 

and  G,  and  s,  (n)  are  defined  by 

li(2)  gi(l)  ( 

0  gi(2)  gi 


G._[gi(2)  ®(1)  0  ] 

‘ "  1  "  -  T(l)  J 


>  s  i(n)  = 

Si(n  —  1) 

Si{n) 

J 

Si{n  +  1) 

(7) 

Note  that  z(n)  contains  the  ISI  components  for  the  sig¬ 
nal  of  interest,  the  MUI,  and  the  background  noise  o(n). 


3.  REDUCED  RANK  DECORRELATING 
RAKE  RECEIVER 

The  signature  waveform  of  the  desired  user  is  a  linear  com¬ 
bination  of  the  columns  of  the  Toeplitz  filtering  matrix  C i 
that  specify  the  constraints  in  the  minimum  output  energy 
(MOE)  receiver  [4],  Omitting  the  desired  user’s  subscript 
the  problem  is  defined  as 


Rq,  =  E{Q,Mx(n)xH(n)MHQf)  (15) 

=  Q,MRxxMhQ,h  (16) 

and 

rQ;  =  E{Q(Mx(n)dj*(n)}  (17) 

=  Q(MRxxwc,;  (18) 


This  blind  reception  problem  can  be  solved  for  each  Q;  and 
wT,i  using  the  eigen-based  reduced  rank  methods  and  the 
multistage  decomposition  [1],  The  filter  outputs  z;  =  w;x 
can  then  be  coherently  combined  to  obtain  the  final  signal 
estimate  using  an  estimate  of  the  principal  vector  of  Rzz. 
The  computational  effort  for  coherent  combining  is  on  the 
order  of  0(L2)  which  is  insignificant  effort  due  to  L  <  2 Lc. 


3.1.  Eigen-Based  Methods 


w;  =  arg  min  wfRxxw(  s.t.  CHw(  =  1(  (8) 

w, 

where  Rxx  =  E{x(n)xH  (n) }  is  2 Lc  x  2 Lc  data  covariance 
matrix,  wj  is  the  weight  vector  corresponding  to  the  Ith 
arm  of  the  RAKE  receiver,  and  (.)H  represents  complex 


Eigen-based  methods  have  been  extensively  used  in  order  to 
obtain  a  lower  dimensional  subspace  for  the  received  data  [1, 
3].  These  methods  are  based  on  the  eigen-decomposition  of 
the  covariance  matrix 

Ryy  =  E{y(n)y(n)H}  =  VAVH  (19) 
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Hero  wo  assume  that  each  blocking  matrix  B;,,„  is  2 L,.  x2 Lc 
so  that  each  vector  f/.,„  is  2 Lr  x  1.  And  it  will  be  convenient 
to  normalize  the  filters  f/,i,  •  •  •  ,f).r>  so  that  ||  =  1. 

The  outputs  of  the  filters  fj,i,  ■  ■  • ,  f;,n  are  linearly  com¬ 
bined  via  the  weights  wi(in),  •  •  • ,  wi(D)  to  obtain  the  filter 
output.  This  is  accomplished  stage  by  stage.  Referring  to 
Figure  2: 

w/(jn)  =  arg  min  E{|f(.ni-i («)!”}>  m  =  1,  -,D  (25) 

U'i(m  ) 

whore 

ei,„,(n)  =d( ,,„(??)  -  wi(m  +  l)a,m+i{n).  (2G) 


Figure  1:  Reduced  rank  decorrelating  RAKE  receiver. 


where  V  is  the  orthonormal  matrix  of  eigenvectors  of  Ryy 
and  A  is  the  diagonal  matrix  of  eigenvalues. 

In  the  principle  components  (PC)  method  the  rows  of 
Qz  are  determined  by  the  Ryy’s  D  principle  eigenvectors 
which  are  associated  with  the  D  largest  eigenvalues.  The 
performance  of  this  method  degrades  rapidly  when  D  is 
smaller  than  the  dimension  of  signal  subspace  in  the  blocked 
data  y (?).). 

An  alternative  to  PC  is  the  Cross  Spectral  (CS)  method. 
The  CS  method  chooses  a  set  of  D  eigenvectors  from  each 
branch  to  minimize  the  MSE  on  that  branch.  The  full  rank 
MMSE  on  the  Ith  branch  may  be  expressed  in  terms  of 


ti  = 


2 

°<h 

2 

aci, 


2 


where  (rj(  is  the  variance  of  di(n),  ry<f,  is  the  cross  cor¬ 
relation  between  y (n)  and  di(n),  v,  is  the  ith  eigenvector 
of  Ryy,  and  A;  is  the  associated  eigenvalue.  To  minimize 
MSE,  D  eigenvectors  with  the  largest  values  of  |ry(f,v;|2/A, 
are  selected.  This  method  can  perform  well  even  if  D  is 
smaller  than  the  dimension  of  signal  subspace. 

The  disadvantage  of  both  PC  and  CS  is  that  they  in¬ 
clude  eigen-decomposition  with  a  complexity  of  0{AL3c).  In 
the  next  section,  motivated  by  the  MSW  filter  [1,  2],  we 
present  another  reduced  rank  method  which  performs  bet¬ 
ter  than  PC  and  CS  and  does  not  require  matrix  inversion 
or  an  eigen  decomposition. 

4.  MULTISTAGE  DECOMPOSITION  OF 
DECORRELATING  RAKE  RECEIVER 


Figure  2:  Multistage  RAKE  Receiver  for  Ith  arm. 

The  rank-D  MSW  filter  is  given  by  the  following  set  of 
recursions. 

Initialization: 


as  follows: 

H  p-1 
ry dj  -^yy  ry d\ 

(20) 

For  m 

Fyd,  VA-1  V”ry(/, 

(21) 

iLc-L  .  h  ,,  |2 
lry rfj  v'  1 

2 's  A; 

(22) 

di.  0(n)  =  di(n),  yi.o(«)  =  Mx(n) 
•  •  • ,  D  (Forward  Recursion): 


B  l.m 

y  l.m(n) 


||£'{yi,m-i(n)d;,in_1(»)}|| 

dl,m(n)  —  ^/,rn  —  1  (tt) 

=  I  - 

=  B"„,yi,,„-i(n) 

Decrement  m  =  Z),  •  •  • ,  1  (Backward  Recursion): 

di,m- i(n)  -  «t((m)*e/,m(ti) 


wi{m)  = 
ei,m-i(n)  - 


(27) 


(28) 

(29) 

(30) 

(31) 


(32) 

(33) 


where  u.D(n)  =  di.D(n).  e;,o(n)  is  the  final  signal  estimate 
of  the  Ith  arm  and  it  is  the  input  to  the  coherent  combiner. 
The  Coherent  combiner  uses  the  principal  eigenvector  of 
the  estimated  covariance  matrix  formed  by  the  signal  esti¬ 
mates  of  each  arm.  Each  stage  has  a  complexity  of  0(L2) 
and  multistage  decomposition  does  not  need  the  complete 
estimation  of  the  covariance  matrix. 


Once  we  form  di(n)  =  w";x(r?)  and  the  data  is  blocked  by 
the  blocking  matrix  M,  reduced  rank  MSW  filtering  can  be 
applied  to  each  arm  to  suppress  the  interference  [1,  2].  A 
block  diagram  for  a  rank-3  MSW  filter  is  shown  in  Figure  2. 

The  stages  are  associated  with  the  sequence  of  nested 
filters  f(,i ,  •  •  •  ,f(,D  where  fi«Ic  is  the  order  of  the  reduced 
rank  filter.  B|,m  denotes  a  blocking  matrix  such  that: 

B”mf;,m  =  0  (23) 

Referring  to  Figure  2,  {f/,™}  are  determined  by 

fgm  =  £{yi,m— i(rc)di*m_i(n)},  m  =  1,-  ■  ■ ,  D  (24) 


5.  SIMULATION  RESULTS 

Computer  simulations  have  been  conducted  to  examine  the 
MSE  performance  and  the  convergence  behavior  of  the  pro¬ 
posed  receiver.  We  consider  a  10-user  system  with  spread¬ 
ing  gain  of  16  and  400  symbols.  Signals  go  through  three- 
ray  multipath  fading  with  an  SNR  of  10  dB.  The  results  in 
Figure  3  and  Figure  4  are  averaged  over  100  simulations. 

Figure  3  shows  the  output  MSE  as  a  function  of  rank  for 
the  proposed  receiver,  CS,  and  PC.  Results  show  that  PC 
degrades  rapidly  with  the  decreasing  rank.  On  the  other 
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Figure  3:  The  comparison  of  proposed  receiver  with  eigen-  Figure  5:  MSE  vs.  iteration  for  different  receivers  for  D  = 
based  methods.  8. 


hand  for  given  subspace  dimension  D,  CS  performs  much 
better  than  PC.  We  also  observe  that  CS  achieves  near  full- 
rank  performance  when  D  =  22.  The  proposed  receiver 
outperforms  both  CS  and  PC.  It  achieves  near  full  rank 
performance  with  less  than  half  the  number  of  weights  used 
in  the  CS  method. 

Figure  4  plots  the  MSE  for  the  following  MSW-based 
receivers:  training-based  MMSE,  MMSE  with  known  signa¬ 
ture  waveform,  the  proposed  decorrelating  RAKE  receiver, 
and  the  single  arm  receiver.  The  results  show  that  our 
proposed  method  gives  nearly  identical  MSE  compared  to 
the  MMSE  receiver  with  known  signature  waveform.  The 
training-based  MMSE  performs  best  among  all  the  receivers, 
whereas  the  single  arm  receiver  [6]  has  the  worst  perfor¬ 
mance. 


Figure  4:  The  comparison  of  different  receivers  with  varying 
rank. 

The  convergence  of  the  above  methods  is  compared  in 
Figure  5  for  a  similar  scenario  and  D  =  8.  It  is  seen  that 
the  convergence  rate  of  the  proposed  receiver  is  close  to 


that  of  the  MMSE  receiver  with  known  signature  waveform. 
With  large  data  size,  the  proposed  method  approaches  the 
training-based  MMSE  receiver. 

6.  CONCLUSIONS 

In  this  paper,  we  have  proposed  and  demonstrated  a  re¬ 
duced  rank  decorrelating  RAKE  receiver  in  the  presence  of 
frequency  selective  fading.  The  proposed  RAKE  receiver  is 
based  on  the  multistage  Wiener  (MSW)  filter  of  Goldstein 
and  Reed  [2].  This  receiver  allows  a  large  reduction  in  rank 
as  well  as  in  computational  complexity  relative  to  other  re¬ 
duced  rank  filters  based  on  eigen  decomposition  methods. 
Finally,  the  proposed  method  offers  a  performance  similar 
to  Honig’s  reduced  rank  MMSE  receiver  with  the  known 
signature  waveform  [1], 
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ABSTRACT 

A  new  technique  is  proposed  for  robust  multiuser  detection 
in  channels  with  impulsive  noise.  The  method  is  a  modification 
of  traditional  non-linear  multiuser  detection  techniques,  whereby 
the  non-linearity  is  now  positioned  between  the  multiplier  and  the 
summation,  within  the  correlator,  instead  of  preceding  it.  This 
is  then  extended  to  provide  a  modification  to  the  two-stage  non¬ 
linear  detector  proposed  in  [1].  Simulation  results  show  that  the 
use  of  this  technique  increases  multiple  access  interference  (MAI ) 
rejection.  Near-far  effects  are  also  investigated. 

1.  INTRODUCTION 

Conventional  communication  system  models  assume  that  the  noise 
in  the  channel  is  Gaussian.  However,  in  general,  electromagnetic 
interference  in  channels  displays  impulsive  behaviour,  and  is  there¬ 
fore  non-Gaussian.  Conventional  detection  methods  are  optimised 
for  operation  in  Gaussian  noise  environments  [2],  and  therefore, 
severe  performance  degradation  occurs  when  the  noise  is  non- 
Gaussian. 

In  Code-Division  Multiple  Access  (CDMA)  channels,  this  im¬ 
plies  that  fewer  users  can  use  the  channel,  for  a  given  level  of  sig¬ 
nal  power.  However,  it  has  been  shown  [3]  that,  if  properly  treated, 
non-Gaussian  noise  can  be  beneficial  to  a  system.  It  is  necessary  to 
design  multiuser  detection  schemes  that  are  robust  to  various  levels 
of  impulsive  noise.  In  this  paper,  non-linear  detection  schemes  are 
proposed  in  an  attempt  to  improve  the  performance  of  multiuser 
detectors  in  impulsive  noise  channels. 

The  paper  is  structured  as  follows.  In  Section  2,  the  system 
model  for  DS-CDMA  communications  and  the  noise  model  for  the 
impulsive  noise  channel  are  described.  In  Section  3,  a  modified 
multiuser  detection  model  is  proposed  and  simulation  results  are 
given.  It  is  shown  that  such  a  scheme  offers  better  MAI  rejection 
than  the  conventional  non-linear  multiuser  detection  schemes  and 
therefore  offers  better  performance  in  a  multiuser  system.  Based 
on  this  concept  a  modification  to  the  two-stage  nonlinear  multiuser 
detector  given  in  [1]  is  proposed  in  Section  4.  It  is  shown  that  this 
model  outperforms  both  the  decorrelator  and  the  two-stage  non¬ 
linear  detector.  Finally,  Section  5  contains  some  conclusions. 

2.  SYSTEM  MODEL 

Consider  a  /\ -user,  baseband  direct  sequence  -  code  division  mul¬ 
tiple  access  (DS-CDMA)  system  operating  with  a  coherent  bi¬ 
nary  phase  shift  keying  (BPSK)  modulation  format,  where  signals 


are  transmitted  synchronously  over  the  channel.  The  synchronous 
transmission  model  can  be  used  without  loss  of  generality.  The 
continuous-time  baseband  signal  received  at  the  detector  can  be 
modelled  as: 

JU-l  K 

r(f)  =  Y.  ^4*M*M*  -  iT )  +  n(0  (1) 

i- 0  *•=! 

where  M  is  the  length  of  the  data  block  in  symbols,  per  user,  in 
the  observed  data  frame,  K  is  the  number  of  active  users,  T  is 
the  symbol  period.  .4*, 6*  6  {—1, 1}  and  s*.(f)  are  the  fcth  users 
received  amplitude,  symbol  and  normalised  signature  waveform, 
respectively,  and  n(t )  is  the  ambient  channel  noise,  assumed  to  be 
identically  and  independently  distributed.  For  the  direct-sequence 
spread  spectrum  (DS-SS)  multiple  access  format,  the  normalised 
signature  waveforms  take  on  the  following  form 

A' 

«,(f)  =  ^c,0-)PTf((-0-l)Tc)  (2) 

3  =  1 

where  N  is  the  processing  gain,  c*.(j)  6  {-1, 1}  are  the  signature 
bits  for  the  fcth  user,  and  Prr  is  the  normalised  chip  waveform 
with  duration  Tc  —  j..  At  the  detector  the  received  signal,  r(f),  is 
filtered  by  a  chip-matched  filter  and  then  sampled  at  the  chip  rate. 
The  received  signal,  for  a  single  symbol  interval,  is  then  given  by 

K 

r3  =Y^AkbkSkj  +  nj  ,  j  =  (3) 

A-=l 

or  in  vector  form. 

r  =  SAb  +  n  (4) 

where  S  =  (si, . . .  ,s/c),  with  s*.  =  A  = 

diag(.4] , . . . ,  Ak  ),  b  =  (6i , . . . ,  6/, )',  where '  denotes  transpose 
and  n  is  the  noise.  It  is  well  known  that  there  are  two  types  of  in¬ 
terference  in  the  detection  of  a  single  users  signal.  The  first  type 
of  interference  is  due  to  other  users  in  the  system.  This  is  known 
as  MAI.  which  is  caused  by  the  correlation  between  different  users 
signature  waveforms.  The  second  type  of  interference  is  due  to  the 
noise,  which  has  a  non-Gaussian  distribution. 

To  model  impulsive  noise  phenomena,  a  noise  model,  that  is  a 
probability  distribution  function  (PDF),  with  heavier  tails,  than  the 
Gaussian  model,  is  required.  The  Cauchy  distribution  is  consid¬ 
ered  in  this  paper  to  model  the  non-Gaussian  noise  process.  The 
PDF  of  the  Cauchy  distribution  is 
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Fig.  1.  Modified  Non-Linear  Multiuser  Detection  Model 


Cutter 
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where  7  >  0  is  the  spreading  factor,  similar  to  variance.  It  should 
be  noted  that  the  Cauchy  model  has  an  infinite  variance,  but  de¬ 
spite  this,  it  is  useful  to  model  impulsive  noise  due  to  its  heavy  tail 
behavior  compared  to  the  traditional  Gaussian  model.  The  Cauchy 
model  is  a  special  case  of  the  symmetric-alpha-stable  (sas)  para¬ 
metric  model.  The  signal  to  noise  ratio  (SNR)  for  an  sas  process 
is  defined  as 


Fig.  2.  Locally  Optimal  and  Sub-Optimal  Non-Linearities 

The  bit  decision  is  made  on  the  test  statistic  T(r)  =  g(xr). 

3.2.  The  Non-Linearities 


SNRdB  =  20  log  iq 


(6) 


where  a  =  1  for  Cauchy  noise  and  a  =  2  for  Gaussian  noise. 
When  a  =  2  the  SNR  reduces  to  the  conventional  Gaussian  SNR 
with  2y  =  a2. 


3.  MODIFIED  NON-LINEAR  MULTIUSER  DETECTION 


If  the  model  of  the  noise  in  the  channel  is  known  to  the  receiver 
then,  in  conventional  non-linear  detection,  a  locally  optimum  non¬ 
linearity  [4]  can  be  used.  However,  this  is  not  commonly  the  case 
and  therefore  sub-optimal  non-linearities  are  required.  Examples 
of  these  non-linearities  can  be  seen  in  Figure  2.  The  parameters 
of  the  non-linearities  were  set  for  the  highest  SNR  at  the  output  of 
the  non-linearity,  using  a  first  order  approximation;  the  derivation 
is  not  included  for  brevity. 


In  this  section,  a  modified  non-linear  detection  model  is  proposed 
and  analyzed.  Firstly,  the  model  is  given  and  shown  to  be  the  so¬ 
lution  to  a  modified  least  squares  regression.  Secondly,  various 
non-linearities  are  introduced  for  testing  of  the  system,  and  finally, 
the  performance  of  the  new  detection  technique  is  shown. 

3.1.  The  Modified  Detection  Model  and  Least  Squares 

In  an  attempt  to  cause  the  conventional  non-linear  detectors  to 
demonstrate  greater  levels  of  MAI  rejection,  and  therefore  better 
bit  error  rate  (BER)  performance  in  multiuser  channels,  we  pro¬ 
pose  the  modified  non-linear  multiuser  detection  model,  shown 
in  Figure  1.  Note  that,  unlike  in  conventional  non-linear  detec¬ 
tors  (see,  for  example,  [3,  4,  5]  and  references  therein),  the  non¬ 
linearity  is  placed  between  the  multiplier  and  the  summer  within 
the  conventional  linear  detection  scheme,  rather  than  before  the 
multiplier.  As  a  result,  using  the  decorrelator  [2],  that  is  x  = 
(S'S)~ 1  S'  in  Figure  1,  the  MAI  can  be  removed  before  the  non¬ 
linearity  affects  the  data.  The  data  is  then  passed  through  the 
equivalent  of  a  non-linear  summation.  We  will  refer  to  the  struc¬ 
ture  in  Figure  1  as  the  modified  non-linear  detector. 

For  a  non-linearity  g(-),  it  can  be  shown,  for  a  single  bit  pe¬ 
riod,  that  the  output  of  the  modified  non-linear  detection  scheme, 
using  the  decorrelator,  is  the  solution  to 

N  ,  K  . 

1(°t)Ukj=  0,  k  =  l,...,K  (7) 

3= 1  '  i—i  ' 

where  9i  —  Aibi,  l  =  1, ....  A',  which  is  a  modified  least  squares 
estimate  of  9  =  Ab,  that  is 

0  =  argmin||r-Sg“1(0)||2  (8) 


3.3.  Simulations  and  Results 

All  simulations  in  this  paper  were  carried  out  using  random  codes 
with  a  spreading  gain  of  TV  =  30,  in  a  Cauchy  noise  channel. 
Random  codes  were  used  to  maximize  the  MAI  so  that  the  MAI 
rejection  capabilities  of  the  detectors  could  be  tested.  Figure  3 
shows  the  BER  against  the  number  of  users  in  the  system,  for 
SNR  x  bdB  and  NFR  =  OdB.  The  first  two  curves  are  using 
the  conventional  non-linear  detection  scheme  [4],  while  the  next 
two  curves  are  plotted  using  the  modified  detection  scheme.  It  can 
be  seen  that  the  modified  detection  scheme  has  better  MAI  rejec¬ 
tion  ability  since  the  rate  of  performance  degradation,  as  the  num¬ 
ber  of  users  increases,  is  less  than  that  when  using  the  conventional 
non-linear  detection  scheme,  even  when  using  a  locally  optimum 
non-linearity.  In  a  realistic  multiuser  system,  with  many  users,  the 
modified  detection  scheme  gives  better  BER  performance. 

Figure  4  shows  the  BER  against  the  NFR  for  the  two  different 
non-linear  detection  schemes,  for  K  =  3  users  and  SNR  «  5 dB. 
It  can  be  seen  that  when  optimizing  in  terms  of  the  BER,  nei¬ 
ther  scheme  is  near-far  resistant,  compare  the  clippers.  However, 
there  exists  a  trade-off  in  the  modified  detection  scheme  between 
BER  performance  and  near-far  resistance,  as  in  the  conventional 
scheme.  This  can  be  seen  in  the  two  plots  of  the  modified  smooth 
limiting  non-linearity  with  different  parameters. 

The  performance  of  the  detectors  in  Gaussian  noise,  using 
the  clipper  non-linearity  with  comparative  parameter  settings,  is 
shown  in  Figure  5.  The  simulation  was  conducted  for  AT  =  3 
users.  Using  the  conventional  non-linear  detection  scheme  perfor¬ 
mance  degradation  occurs  in  a  Gaussian  noise  environment,  while 
if  the  modified  detection  model  is  used,  there  is  little  or  no  perfor¬ 
mance  loss. 
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Bit  Error  Rate  versus  number  of  Users 


Bit  Error  Rate  versus  Signal  to  Noise  Ratio 


Fig.  3.  Bit  Error  Rate  versus  the  Number  of  Users 


Fig.  5.  Multiuser  Detection  in  Gaussian  Noise 


4.  MODIFIED  TWO-STAGE  NON-LINEAR  MULTIUSER 
DETECTION 

In  this  section  we  propose,  and  demonstrate  the  performance  of,  a 
modified  version  of  the  two-stage  non-linear  detector  in  [  1  ] .  Firstly, 
the  modification,  based  on  the  modified  non-linear  multiuser  de¬ 
tector  proposed  in  Section  3,  is  shown  and  explained,  and  sec¬ 
ondly,  simulation  results  comparing  its  performance  with  the  origi¬ 
nal  two-stage  non-linear  detector  and  other  detectors  are  presented. 

4.1.  Modified  Two-Stage  Non-Linear  Detection  Model 

In  [  1]  a  two-stage  non-linear  detection  scheme  was  proposed  whereby 
the  first  stage  consisted  of  a  conventional  non-linear  multiuser  de- 


Bit  Error  Rate  versus  Near-Far  Ratio 


tection  scheme  which  estimated  the  MAI  and  removed  it  from 
the  signal,  and  the  second  stage  consisted  of  a  single  user  con¬ 
ventional  non-linear  detector.  A  modification  to  this  detector  is 
proposed  for  two  reasons;  firstly,  the  problem  with  the  near-far 
resistance  in  the  modified  non-linear  multiuser  detector  can  be  re¬ 
duced.  Secondly,  the  modified  non-linear  detector  has  better  BER 
performance  in  a  high  user  system  than  the  conventional  non-linear 
scheme,  and  therefore  the  performance  of  the  first  stage  of  the  two- 
stage  non-linear  detector  can  be  improved  using  the  modified  non¬ 
linear  scheme.  It  is  also  helpful  since  optimum  performance  in  the 
modified  scheme  does  not  require  a  locally  optimum  non-linearity. 

The  modified  two-stage  non-linear  multiuser  detector  can  be 
seen  in  Figure  6.  The  first  stage  uses  the  modified  detection  model 
to  estimate  the  MAI  and  then  removes  it  from  the  received  sig¬ 
nal.  leaving  a  single  user  detection  problem,  in  impulsive  noise. 
The  conventional  non-linear  detector  was  used  in  the  second  stage 
since  it  performs  better  than  the  modified  one  in  the  single  user 
case. 


Fig.  4.  Near-Far  Characteristics 


Fig.  6.  Modified  Two-Stage  Non-Linear  Detector 
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Fig.  7.  Multiuser  Detection  in  Cauchy  Noise 


4.2.  Simulations  and  Results 

The  choice  of  the  non-linearities  was  as  follows.  The  clipper  was 
chosen  for  gi  (u)  due  to  its  good  MAI  rejection  capabilities  and 
simplicity,  and  therefore  is  useful  for  detecting  individual  users 
signals  in  MAI.  Since  the  second  stage  is  essentially  the  detection 
of  a  single  user  in  impulsive  noise,  a  locally  optimal  non-linearity 
would  give  the  optimum  results.  However,  since  it  is  unlikely  that 
the  exact  noise  model  would  be  known  at  the  receiver,  a  locally 
sub-optimum  non-linearity  is  used.  The  clipper  is  also  chosen  for 
92  (u)  since  it  performs  very  closely  to  the  locally  optimum  case 
in  a  single  user  environment  [6]. 

Figure  7  shows  the  results  of  simulations  with  K  —  6  users 
and  NFR  =  10 dB.  The  performance  gain  using  the  model  of 
Figure  6  can  be  clearly  seen.  Figure  8  compares  the  near-far  char¬ 
acteristics  of  the  detectors,  for  SNR  «  5 dB  with  K  =  6  users.  It 
can  be  seen  that  the  near-far  characteristic  is  similar  to  a  decision- 
feedback  detector.  The  modified  two-stage  non-linear  detector  out¬ 
performs  the  two-stage  non-linear  detector  at  all  levels  of  the  NFR, 
increasingly  as  the  magnitude  of  the  NFR  increases. 


5.  CONCLUSIONS 

The  modified  detection  schemes  investigated  was  shown  to  im¬ 
prove  performance  in  terms  of  MAI  rejection  over  the  conventional 
non-linear  detection  schemes,  in  multiuser  environments  with  im¬ 
pulsive  noise.  The  scheme  involves  placing  the  non-linearity  be¬ 
tween  the  multiplier  and  the  summation,  rather  than  before  the 
multiplier.  This  gain  is  achieved  with  little  or  no  increase  in  com¬ 
putational  cost  and  the  requirement  for  locally  optimum  non- 
linearities  is  alleviated.  The  proposed  model  was  also  shown  to 
improve  performance  in  a  Gaussian  noise  environment,  over  con¬ 
ventional  non-linear  detection  techniques.  Near-far  characteristics 
were  also  improved  by  using  the  modified  two-stage  approach. 


Bit  Error  Rate  versus  Near-Far  Ratio 


Fig.  8.  Near-Far  Performance  Comparison 
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ABSTRACT 

The  existing  dual-rate  blind  linear  detectors,  which  operate  at 
either  the  low-rate  (LR)  or  the  high-rate  (HR)  mode,  are  not 
strictly  blind  at  the  HR  mode  and  lack  theoretical  analysis.  This 
paper  proposes  the  subspace-based  LR  and  HR  blind  linear 
detectors,  i.e.,  blind  decorrelating  detectors  (BDD)  and  blind 
MMSE  detectors  (BMMSED).  for  synchronous  DS/CDMA 
systems.  To  detect  an  LR  data  bit  at  the  HR  mode,  an  effective 
weighting  strategy  is  proposed.  The  theoretical  analyses  on  the 
performances  of  the  proposed  detectors  are  carried  out.  It  has 
been  proved  that  the  bit-error-rate  of  the  LR-BDD  is  superior  to 
that  of  the  HR-BDD  and  the  near-far  resistance  of  the  LR  blind 
linear  detectors  outperforms  that  of  its  HR  counterparts.  The 
extension  to  asynchronous  systems  is  also  described.  Simulation 
results  show  that  the  adaptive  dual-rate  BMMSED  outperform 
the  corresponding  non-blind  dual-rate  decorrelators  proposed  in 
[2]. 


1.  INTRODUCTION 

The  3rd-generation  wireless  communications  systems  must  be 
able  to  accommodate  the  heterogeneous  traffic,  such  as  voice, 
data  and  video,  which  have  the  different  data  rates  and 
requirements  of  quality  of  service.  This  makes  it  imperative  to 
develop  multirate  DS/CDMA  receivers.  As  a  result,  several 
receivers  originally  proposed  for  single-rate  DS/CDMA  systems 
have  been  investigated  for  their  use  in  multirate  cases.  Typical 
examples  are  the  low-rate  decorrelator  (LRD)  and  the  high-rate 
decorrelator  (HRD)  for  dual-rate  synchronous  systems  [1.2], 
which  match  to  the  bit  interval  of  the  low-rate  (LR)  users  and  the 
high-rate  (HR)  users,  respectively.  To  overcome  their 
requirements  of  the  prior  knowledge  of  the  interfering  users,  the 
LR  and  HR  blind  linear  MMSE  detectors  (LMMSED)  were 
proposed  for  DS/CDMA  systems  in  [3]  and  [4].  However,  the 
performances  of  LR-LMMSED  and  HR-LMMSED  are  compared 
only  by  the  numerical  simulations.  Also,  to  decode  an  LR  data 
bit  at  the  HR  mode,  the  signal  to  interference-plus-noise  ratio 
(SINR)  of  this  LR  user  within  each  subinterval,  which  involves 
the  knowledge  of  the  noise  level  and  the  interfering  users,  is 
required  to  weight  the  partial  result  obtained  in  tire 
corresponding  subinterval.  Thus.  HR-LMMSED  is  not  strictly 
blind. 


detectors  are  carried  out.  Die  extension  to  asynchronous  systems 
is  also  described. 


2.  SIGNAL  MODEL 


In  this  work,  the  baseband  signal  is  assumed  to  be  dual-rate 
binary  DS/CDMA  with  variable  spreading  factor  (VSF). 
However,  all  the  results  presented  in  this  paper  can  be  easily 
generalized  to  a  general  multirate  system.  We  also  assume  that 
the  system  is  synchronous  in  the  formulation  of  the  problem.  Die 
asynchronous  case  will  be  discussed  in  Subsection  3.4.  We 
denote  the  processing  gain  of  the  LR  users  as  Nn  and  that  of  the 
HR  users  as  N ,.  where  NJN^M  is  an  integer.  The  normalized 
signatures  are  denoted  by  s,  0  and  ]  for  the  kth  LR  and  HR 

user,  respectively. 


In  a  single  LR  bit  interval,  each  HR  user  can  be  viewed  as  M 
virtual  LR  users.  The  /tth  LR  user  transmits  bit  bk  n  with  received 
amplitude  Akfl.  The  mth  virtual  user  corresponding  to  the  kth  HR 
user  transmits  bit  bk  l(m)  with  the  received  amplitude  Ak  l  using 
the  signature  sequence  s'"’ .  which  is  equal  to  st  l  in  Die  mth 

subinterval  and  otherwise  is  zero.  Die  received  signal  in  a  single 
LR  bit  interval  can  be  written  as  an  A'0-vector 


L=  I A  A  o-L  o + X  { l'  A  A ("OlT  j  +” 

A  =  1  AHlm- 0  J 

=  S0A0A0  +  S,A  ,A,+n 


X 

o’ 

i 

o 

> 

l&J 

(1) 


=  SAb  +  n 

where  S0  consists  of  the  signatures  of  K0  LR  users.  A^diag 


a"d  L  =  [V-A,,f-  sr=lC -O-  A>= 

diag{A,.---.A,}  where  A,  =diagJ4,,---,4, and  A,  =|J>01 

M 

where  {/T  ,  is  ordered  as  hnll  =[bu (m)-  -bK  , (»/)].  n  is  an 
additive  white  Gaussian  noise  (AWGN)  vector  with  the 
covariance  matrix  o2\N  .  In  addition,  the  received  signal  can 
also  be  modeled  in  each  subinterval  as  an  A', -vector 


r(„,)  _  s<”’>  Xb{m)  + 1!"'\  0  <  m  <  M  - 1 ,  (2) 


This  paper  proposes  the  subspace-based  LR  and  HR  blind 
decorrelating  detectors  (BDD)  and  blind  MMSE  detectors 
(BMMSED).  inspired  by  the  single-rate  counterpart  [5].  A  blind 
weighting  scheme  is  proposed  to  detect  the  LR  data  bits  at  the 
HR  mode.  The  theoretical  analyses  of  the  proposed  blind 


where  S'”’  =  [§'""  S,].  and  S‘“’  is  the  portion  of  S0  within  the 
/utli  subinterval,  and  S,  consists  of  the  signatures  of  A',  HR 

users.  A<”’  =  \bT0  A_ml]r  ,  and  A  =  diag{A„.A,}  .  n""  is  the 
corresponding  noise  vector. 
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3.  DUAL-RATE  BLIND  LINEAR 
DETECTORS 

This  section  will  develop  the  LR-BDD,  LR-BMMSED.  HR- 
BDD,  and  HR-BMMSED. 


BMMSED  can  be  estimated  based  on  the  updated  signal 
subspace  and  some  intermediate  variables  of  the  OPAST 
algorithm.  Due  to  lack  of  space,  the  details  of  this  adaptive 
algorithm  are  not  given  in  this  paper. 

3.2  The  HR  Blind  Linear  Detectors 


3.1  The  LR  Blind  Linear  Detectors 

In  terms  of  model  (1),  in  a  single  LR  bit  interval,  the  dual-rate 
system  with  K0  LR  users  and  K\  HR  users  is  equivalent  to  a 
single-rate  system  with  KL=K0+MK ,  users.  For  convenience,  we 
assume  that  the  data  bit,  the  signature  and  the  received  amplitude 
of  the  Mi  user  are  represented  by  bk,  sj.  and  Ah  respectively, 
whose  physical  meanings  can  be  readily  understood  via  (1). 

Without  loss  of  generality,  we  assume  that  rank(S)  =  K L .  By 

performing  an  eigendecomposition,  the  covariance  matrix  of  the 
received  signal  r  can  be  represented  by 

C  =  £{iZr}  =  EsYlEjr+E„Y„E^  (3) 

where  E,  is  an  orthonormal  basis  of  the  signal  subspace  and  E„ 
is  that  of  the  noise  subspace  orthogonal  to  E, .  Y,  contains  the 
Kl  largest  eigenvalues  of  C  and  Y„  =a1\,,  K  .  Based  on  the 

subspace  parameters,  a  linear  detector  for  demodulating  the  Mi 
user’s  data  bit  can  be  written  as  (by  following  the  single-rate  case 
in  [5]): 

bk=sga(£kr),  (4) 

where 


In  terms  of  model  (2),  in  the  mth  subinterval,  each  HR  user  k 
transmits  one  bit  using  the  signature  sk , .  Each  LR  user  k 

transmits  the  mth  segment  i‘"'0>  of  the  signature  st  0 .  Thus,  it  is 

equivalent  to  KH=K0+Kt  HR  users  simultaneously  transmitting 
one  data  bit  in  each  subinterval.  For  convenience,  we  enumerate 
all  active  users  such  that  the  Mi  LR  user  is  numbered  k  while  the 
Mi  HR  user  is  numbered  ( K0+k ). 

We  assume  thatrank(S("l>)  =  KH  .  The  eigendecomposition  is 

performed  on  the  covariance  matrix  of  the  received  signal  r"" 

and  the  subspace  parameters  such  as  E‘,")  and  Y,1"0  can  be 

obtained.  In  the  with  subinterval,  a  linear  detector  for  detecting 
the  Mi  HR  user’s  data  bit  can  be  written  as 

htl(m)  =  sgn(^";V”'>),  (9) 

where 

(10) 

£<”')^Y<™>  _<J2I  )-iE(mP  HR-BDD 

(11) 

E(».y(»)-,E(»)T  ^  HR  -  BMMSED 


±k  =D£*AlD£t 

5 

(5) 

oJE/X-aT, 

)-'£[,  LR-BDD 

(6) 

[e.yX, 

LR-BMMSED 

Clearly,  the  two  LR  blind  linear  detectors  become  identical  as 
<7  — >  0  .  The  scalar  constant  l/£,  Dst  has  no  effect  on  the  signal 
detection  and  thus  can  be  removed.  Note  that  the  implicit 
assumption  that  the  exact  signal  covariance  matrix  and  thus  its 
eigencomponents  are  known  is  impractical.  Generally,  the 
subspace  parameters  must  be  estimated  from  the  received  signals 
using  batch  eigenvalue  decomposition  of  sample  covariance 
matrix,  or  batch  singular  value  decomposition  of  sample  matrix, 
or  adaptive  subspace  tracking  algorithms.  For  LR-BDD,  the  BER 
of  the  Mi  user  can  be  expressed  as  (by  following  the  single-rate 
case  in  [5]) 

Pt=^At/a^[R-\ty  (7) 

where  R=SrS  and  Q(x)  =  J7-^exp (-t2/2)dt .  The  NFR  [rjk  ]' 
is  thus 

lnkf  =V[r-]m  .  (8) 

LR-BMMSED  has  the  same  NFR  as  LR-BDD. 

We  also  developed  an  adaptive  algorithm  for  the  estimation  of 
LR-BMMSED.  The  signal  subspace  is  first  updated  by  the 
orthonormal  PAST  (OPAST)  algorithm  [6],  and  then  LR- 


The  scalar  constant  l/ s_\  |D<"l)£t  l  can  also  be  dropped.  For  HR- 

BDD,  the  BER  of  HR  user  k  in  the  mth  subinterval  can  be 
determined  by 

Hr = q[a,/  aV[R("r,w.+»)  ’  (!2> 

where  R(m)=S,m)/S(m).  The  NFR  [^J"  for  the  HR  blind  linear 
detectors  is 

=v'[R'"rl],v1,,.1  ■  (i3) 

Since  the  duration  of  an  LR  data  bit  spans  M  HR  bit  intervals, 
the  following  decision  rule  is  used  to  estimate  a  data  bit  of  LR 
user  k\ 


Ko  =  s8h 


A4-1|  j(«')r  (»0  /  Am)7  i(m)  I 

i  e*.o  l  /i,.  L,) 


where 


=  D 


(m) 
.*.0  * 


(14) 

(15) 


Note  that  the  employed  weighting  factors  are  the  reciprocal  of 
detector  coefficients'  energy  (proportional  to  the  output  noise 
power),  and  need  no  knowledge  of  the  noise  level  and  the 
interfering  users.  This  means  that  the  contribution  from  a 
subinterval  to  the  decision  is  inversely  proportional  to  the  output 
noise  power  in  this  subinterval.  This  strategy  is  reasonable  since 
the  output  noise  level  is  dominant  over  or  comparable  to 
multiple-access  interference  (MAI)  after  multiuser  detection  is 
applied,  which  is  particularly  true  for  HR-BDD  where  the  MAI  is 
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completely  suppressed.  For  HR-BDD,  since  s'"^  =  1 . 

£?£ J  =0  U*  k)  £? d™  =[R,”r']1,  [5].  the  BER  of 
LR  user  k  can  be  given  by 

(i6) 

The  NFR  [i]k  0]"  for  the  HR  blind  linear  detectors  is  then 


[rh,  o] 


H 


(17) 


Note  that  the  adaptive  HR-BMMSED  can  also  be  derived  in  a 
similar  manner  to  the  adaptive  LR-BMMSED. 


3.3  A  Comparison  of  LR  and  HR  Blind  Linear 
Detectors 


It  is  easy  to  see  that  the  use  of  the  LR  blind  linear  detectors 
incurs  a  detection  delay  for  the  HR  users.  In  addition,  the  LR  and 
HR  blind  linear  detectors  involve  the  computation  of  the 
subspace  parameters.  Also,  the  computational  complexity  of  the 
former  is  much  higher  than  that  of  the  latter  as  the  ratio  M 
increases.  We  have  the  following  proposition  for  the  BER 
performance  of  LR-BDD  and  FIR-BDD. 


Proposition  1:  For  a  general  dual-rate  synchronous  system,  if 
the  exact  subspace  parameters  are  known,  then 

'■e*0-1*’  <l8> 


where  />™<;?)DD  and  P*'™  are  tire  BERs  of  HR-BDD  and  LR- 

BDD,  respectively,  for  the  LR  or  HR  user’s  bits  in  any 
subinterval.  Especially,  both  achieve  the  same  BER  for  each  LR 
user  if  the  signatures  for  the  LR  users  are  the  same  in  every 
subinterval,  i.e.,  if  the  repetition  code  is  employed. 


Proof.  Note  that  the  proposed  dual-rate  BDD  have  the  same 
BER  expression  as  the  corresponding  dual-rate  non-blind  ones 
[1,2],  which  involve  a  special  diagonal  element  of  the  inverse  of 

signature  correlation  matrix  R  1  or  R1”'1  .  They  differ  in  that 
this  element  for  the  latter  can  be  accurately  calculated  using  the 
known  signatures  of  all  the  users,  while  it  can  only  be 
approximately  estimated  front  the  received  signal  for  the  former. 
Thus,  if  the  exact  subspace  parameters  are  known,  the  former 
should  have  same  BER  as  the  latter.  Since  the  similar  proposition 
has  been  proven  for  the  latter  [1.2],  inequality  (18)  must  hold  as 
well. 


Based  on  the  above  proposition  and  the  fact  that  Q  function  is 
monotonically  descending,  we  can  conclude  that  the  NFR  of  the 
LR  blind  linear  detectors  is  not  inferior  to  that  of  its  HR  rivals. 
Furthermore,  the  above  proposition  and  inference  can  be 
extended  to  the  practical  multirate  scenario  since  a  similar 
extension  for  dual-rate  non-blind  decorrelators  has  been  proven 
m[l]. 


3.4  Asynchronous  Case 

Both  LR  and  HR  blind  linear  detectors  can  be  applied  to 
asynchronous  systems.  The  same  formulae  in  the  synchronous 
case  can  be  used  for  asynchronous  systems  with  an  increased 


dimension  due  to  the  fact  that  the  number  of  virtual  bits  and  their 
associated  virtual  signatures  within  the  processing  interval  is 
larger  than  that  in  the  synchronous  case  [3],  At  the  LR  mode,  if 
the  desired  user  is  an  LR  user,  the  desired  LR  user  is  viewed  as 
the  reference  user,  whose  bit  interval  is  taken  as  the  processing 
interval.  Otherwise  an  arbitrary  LR  user  can  be  chosen  as  the 
reference  user.  For  each  user  other  than  the  reference  one,  there 
might  be  two  virtual  bits  located  at  both  ends  of  the  processing 
interval,  whose  full-length  signatures  are  partitioned  into  two 
virtual  ones.  Therefore,  within  the  processing  interval,  the 
number  of  data  bits  (including  the  actual  and  virtual  bits)  is 
between  K„  +  MK ,  and  2K0  +  (M  +  1)AT,  - 1 .  To  detect  the  data 
bit  of  the  desired  HR  user,  which  is  divided  into  two  virtual  ones, 
a  similar  strategy  to  (14)  can  be  used  to  weight  the  partial 
estimates  over  two  successive  processing  intervals  before  a  final 
decision  is  made.  It  should  be  noted  that  at  the  LR  mode,  the 
number  of  data  bits  and  their  associated  signatures  are  the  same 
within  the  different  processing  intervals. 

At  the  HR  mode,  if  the  desired  user  is  an  LR  user,  an  arbitrary 
HR  user  can  be  chosen  as  the  reference  user,  otherwise  the 
desired  HR  user  is  viewed  as  the  reference  user.  For  each  HR 
user  other  than  the  reference  one,  there  might  be  two  virtual  bits 
located  at  both  ends  of  the  processing  interval  with  full-length 
signatures  being  segmented  into  two  virtual  ones.  For  each  LR 
user,  there  are  either  one  or  two  virtual  bits,  whose  associated 
virtual  signatures  are  only  a  portion  of  the  frill-length  signatures. 
Therefore,  the  number  of  data  bits  is  between  K0  +  A,  and 

2K0  +  2K,  - 1 .  A  similar  weighting  strategy  to  (14)  can  be  used 

to  demodulate  a  data  bit  of  the  desired  LR  user.  Note  that  at  the 
HR  mode,  the  number  of  the  data  bits  and  their  associated 
signatures  might  change  among  the  different  processing 
intervals.  Therefore,  special  attention  should  be  paid  to  the 
implementation  of  the  FIR  blind  linear  detectors  for 
asynchronous  systems. 

Based  on  the  results  in  the  previous  sections,  it  is  obvious  that 
the  BER  (or  NFR)  performances  of  the  dual-rate  blind  linear 
detectors  are  related  to  the  signature  correlation  matrix  in  the 
underlying  processing  interval.  Considering  that  the  signature 
correlation  matrix  depends  on  the  relative  delays  of  all  the  users 
and  the  choice  of  the  reference  user,  it  is  impractical  to  make  a 
theoretical  comparison  on  the  performances  of  the  LR  and  HR 
blind  linear  detectors.  However,  by  the  numeric  simulations  in 
Section  4.  it  is  shown  that  Proposition  1  may  be  invalid  for  a 
general  dual-rate  asynchronous  system. 

4.  SIMULATIONS  AND  CONCLUSIONS 

We  assume  that  there  are  four  active  users  in  our  simulations. 
The  first  two  users  are  the  LR  users  (A0=2)  and  the  others  are  the 
HR  users  (A',=2).  Other  basic  parameters  are  N |=7,  N0=  14  and 
A/=2.  Four  Gold  sequences  with  length  7  are  intentionally  chosen 
such  that  we  have  the  worst  cross-correlation  of  all  signatures. 
The  first  two  sequences  are  used  by  the  LR  users,  whose 
signatures  are  generated  via  the  repetition-code  scheme  proposed 
in  [2],  The  other  sequences  are  assigned  to  the  HR  users.  We 
assume  that  User  1  is  the  desired  LR  user  and  User  4  is  the 
desired  HR  user.  For  Fig.s  1-2,  the  signal-to-noise  ratio  (SNR)  of 
the  desired  user  is  fixed  at  8dB  and  the  SNRs  of  all  the  other 
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Figure  2.  The  BER  curves  of  User  1  (a)  and  User  4  (b)  versus 
the  SNRs  of  the  other  users 
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ABSTRACT 

A  novel  CDMA  receiver  with  adaptive  multiple  access  inter¬ 
ference  (MAI)  and  narrowband  interference  (NBI)  suppression  is 
proposed  for  reverse  link  signal  reception  over  multipath  channels. 
The  design  of  the  receiver  involves  the  following  procedure.  First, 
adaptive  correlators  are  constructed  based  on  the  generalized  side- 
lobe  canceller  (GSC)  scheme  to  collect  the  multipath  signals  and 
suppress  strong  MAI  and  time-varying  NBI.  In  particular,  partial 
adaptivity  is  incorporated  into  the  GSC  for  reduced  complexity 
processing.  A  simple  combiner  with  channel  estimation  then  gives 
the  symbol  decisions.  In  order  to  enhance  the  performance  of  the 
adaptive  correlators,  a  decision  aided  scheme  is  introduced  which 
subtracts  the  reconstructed  signal  from  the  input  data  of  the  GSC, 
leading  to  improved  output  SINR  and  convergence.  The  proposed 
CDMA  receiver  is  evaluated  through  simulations,  and  the  results 
show  that  it  can  outperform  the  conventional  MMSE  receiver  de¬ 
signed  for  time-invariant  interference. 

1.  INTRODUCTION 

A  major  limiting  factor  for  a  CDMA  system  is  the  multiple  ac¬ 
cess  interference  (MAI).  MAI  causes  the  near-far  problem  and  a 
reduction  in  system  capacity.  Adaptive  multiuser  detectors  and 
interference  cancellers  have  been  suggested  which  provide  immu¬ 
nity  to  the  near-far  effect  [1].  On  the  other  hand,  time-varying  nar¬ 
rowband  interference  (NBI )  due  to  the  presence  of  overlay  systems 
can  be  treated  as  a  group  of  equivalent  MAI.  and  suppressed  with  a 
large  adaptive  degree  of  freedom  [3].  With  fully  adaptive  (FA)  im¬ 
plementation,  an  adaptive  CDMA  receiver  requires  (M  —  l)-dim 
processing,  where  M  is  the  processing  gain.  To  reduce  the  com¬ 
plexity,  partially  adaptive  (PA)  implementation  is  suggested  as  an 
alternative  in  which  the  size  of  the  adaptive  weights  is  reduced  by 
judiciously  designed  criteria  [2].  The  advantages  of  PA  implemen¬ 
tation  include  reduced  complexity  and  faster  convergence.  How¬ 
ever,  performance  of  interference  cancellation  usually  degrades  as 
a  result  of  a  smaller  processing  dimension.  A  trade-off  must  thus 
be  reached  between  complexity  and  performance. 

In  this  paper,  a  novel  CDMA  receiver  with  enhanced  joint 
MAI  and  NBI  suppression  is  proposed  for  a  reverse  link  pilot 
symbol-assisted  system  over  multipath  channels.  The  develop¬ 
ment  of  the  new  scheme  involves  the  following  procedure.  First, 
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a  set  of  adaptive  correlators  are  constructed  based  on  the  general¬ 
ized  sidelobe  canceller  (GSC)  [4]  scheme  to  collect  the  multipath 
signals,  suppress  strong  MAI  and  time-varying  NBI.  In  particular, 
partial  adaptivity  is  incorporated  into  the  GSC  for  reduced  com¬ 
plexity  MAI  and  NBI  suppression.  This  is  achieved  by  selecting  a 
reduced  dimension  subspace  of  the  column  space  of  the  blocking 
matrix  of  the  GSC  encompassing  the  MAI  signatures  obtained  in  a 
multi-user  scenario  and  NBI  signatures  obtained  in  overlay  appli¬ 
cations.  Next,  a  simple  combiner  with  channel  estimation  gives  the 
symbol  decisions.  In  order  to  further  enhance  the  convergence  per¬ 
formance  of  the  GSC  adaptive  correlators,  a  decision  aided  scheme 
is  introduced  in  which  the  signal  waveform  is  first  estimated  and 
then  subtracted  from  the  input  data  of  the  correlators.  The  pro¬ 
posed  CDMA  receiver  is  evaluated  through  simulations,  and  the 
results  show  that  it  can  outperform  the  optimal  MMSE  receiver 
designed  for  time-invariant  interference  scenarios. 

2.  DATA  MODEL 

Suppose  that  there  are  K  active  users  in  a  CDMA  system.  The  Ath 
user's  contribution  to  the  received  signal  can  be  written  as 

d,(f)  =  ^6,(*K(f-iT)  (1) 

where  A, (?)  denotes  the  ith  transmitted  information  bit,  T  is  the 
bit  duration,  and  sk{t)  is  the  signature  waveform  given  by 

A/-1 

Sk(t)  -  ^2  -  mTc)  (2) 

771  —  0 

where  cr  [m]  is  the  spreading  sequence  of  the  feth  user,  M  is  the 
spreading  factor,  p(t)  is  the  chip  waveform,  and  Tc  is  the  chip 
duration.  The  transmission  channel  is  modeled  as  with  L  resolved 
Rayleigh  fading  paths.  Putting  the  K  user  signals  together,  the 
received  baseband  data  can  be  expressed  in  the  following  form: 

K  L 

*w  =  EE  ak,idk{t-Tk,i)  +  i(t)  +  n(t)  (3) 

*■=1  t=i 

where  n-j  and  nk,i  are  the  delay  and  complex  gain,  respectively, 
of  the  Ith  path  of  the  A’th  user.  The  i(t)  is  the  NBI  and  n(i)  is  the 
additive  white  noise  with  power  a'i .  To  fully  exploit  the  temporal 
signature,  x(t)  is  chip  matched  filtered  and  sampled.  Assuming 
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user  1  to  be  the  desired  user,  the  resulting  chip-sampled  data  over 
the  ith  symbol  can  be  put  into  the  (M  +  1-1)  x  1  vector: 

x(«)  =  [x(0),x(l),  •  •  •  ,x(M  +  L  -  2)]t 

L 

-  ai,(CTl  ,bi(i)  +  i(i)  +  n(i)  (4) 

;=i 

where  cn  ,  is  the  augmented  signature  vector  associated  with  the 
Ith  path  of  user  1,  i(i)  is  the  interference  vector,  and  n(i)  is  the 
noise  vector.  Depending  on  the  delay  n,/,  cTl ,  is  given  by  one  of 
the  columns  of  the  (M  +  L  —  1)  x  L  matrix: 

C  =  [ci,ici,2  •  ■  -Ci,i]  (5) 

where  ci is  the  (M+L— l)xl  vector  with  [ca-  [0],  Cfc  [1], . . . ,  Ck[M] 
occupying  the  1th  to  (l  +  M  -  l)th  entries.  Note  that  i(i)  in¬ 
cludes  ISI,  NBI  and  MAI.  For  NBI,  a  more  realistic  assumption  is 
to  model  it  as  a  data-like  signal,  e.g.,  a  BPSK  signal  with  signaling 
rate  much  slower  than  the  CDMA  system  chip  rate  [3]: 

i{t)  =  Aj  ^2  bi(i)p(t  -  iTi)  (6) 

i 

where  6/  (i)  denotes  the  ith  NBI  bit,  Tj  is  the  bit  duration,  and  Ai 
is  the  complex  gain.  The  bit  rate  1/T/  is  assumed  Q  times  slower 
than  the  chip  rate  such  that  Ti  =  QTC. 

From  (4),  the  effective  “composite”  signature  vector  of  the  /eth 
user  is  given  by: 

L 

hfc  =  y  Qfc,icTfc  ,  (7) 

;=i 

A  receiver  for  user  k  is  designed  to  identify  and  remove  h*  to 
retrieve  the  data  bits  6* (*)  from  i(i)  and  n (i).  In  particular,  a 
linear  receiver  combines  x(i)  using  a  weight  vector  w  to  obtain 

bk(i)  =  wf  x(i)  (8) 

where  H  denotes  the  complex  conjugate  transpose. 

3.  PROPOSED  RECEIVER 

The  proposed  receiver  is  implemented  with  adaptivity  and  suitable 
for  pilot  symbol-assisted  systems.  Its  overall  schematic  diagram  is 
depicted  in  Figure  1.  Without  loss  of  generality,  it  is  assumed  that 
user  1  is  the  desired  one  and  others  are  MAI. 

3.1.  GSC  Realization  of  Adaptive  Correlators 

Conventionally,  in  order  to  restore  the  processing  gain  and  retain 
the  path  diversity,  x(i)  is  despread  at  each  of  the  L  fingers  using  a 
set  of  discrete-time  correlators: 

«i,l(0  =  wuhi&i(i)  +  wf(i(j)  +  wf,n(i)  (9) 

for  l  =  1, . . . ,  L,  where  wi,j  is  the  correlator  weight  vector  at 
the  fth  finger.  For  an  effective  suppression  of  MAI,  these  weight 
vectors  are  determined  in  accordance  with  the  linearly  constrained 
minimum  variance  (LCMV)  criterion.  To  avoid  signal  cancella¬ 
tion  incurred  with  coherent  multipaths,  the  LCMV  correlators  can 
be  implemented  in  the  form  of  GSC  [4],  The  concept  of  GSC  is 
to  decompose  the  weight  vector  wij  into  two  orthogonal  compo¬ 
nents  aswi,i  =  ci,;  —  Bui,;.  The  matrix  B  is  a  is  a ’’signal  block¬ 
ing”  matrix  which  removes  user  l’s  signal  before  filtering.  Note 


that  B  must  block  signals  from  within  the  entire  delay  spread  in 
order  to  avoid  signal  cancellation  due  to  coherent  multipaths.  The 
goal  is  then  to  choose  the  adaptive  weight  vectors  ui,;  to  cancel  the 
MAI  and  NBI.  According  to  the  GSC  scheme,  ui,;  is  determined 
via  the  minimum  mean  square  error  (MMSE)  criterion: 

min  £{|cf,x(i)-uf,BHx(0|2} 

U1 ,1 

=  ||Ri/2Bu1,;-Ri/2c1,;||2  (10) 

where  the  data  correlation  matrix  R:r  =  £{x(i)xH(i)}.  Solving 
for  111,1  and  putting  wq/s  in  matrix  form,  we  get 

W  =  [wi,i,Wi,2,...,Wi,i] 

=  [I  —  B(BifRIB)_1BJIRj]C  (11) 

The  matrix  B  can  be  chosen  to  be  a  full  rank  (M +L— 1)  x(M— 1) 
matrix  whose  columns  are  orthogonal  to  {ci,i, . . . ,  Ci,l},  i.e., 
B/7  C  =  O  and  B;'  B  =  I.  With  a  large  M,  uu  will  have  a  large 
size  too,  leading  to  a  high  computational  load  and  poor  conver¬ 
gence  in  real-time  implementation  [4],  To  alleviate  this  problem, 
the  PA  GSC  is  proposed  which  uses  only  a  portion  of  the  available 
degrees  of  freedom  offered  by  the  adaptive  weights.  Specifically, 
the  PA  techniques  can  be  employed  to  reduce  the  size  of  B. 

3.2.  Partially  Adaptive  Implementation 

It  is  noteworthy  from  (10)  that  the  optimal  GSC  weight  vector 
uu  is  the  one  lying  in  the  subsapce  /?{Rj/2B}  that  is  closest  to 
Ra-/2ci,;.  In  other  words,  Rj^’Buu  should  be  in  the  direction 
that  exhibits  maximum  ’’correlation”  with  R^ci,;.  It  is  there¬ 
fore  desired  that  the  blocking  matrix  B  be  chosen  such  that  the 
crosscorrelation  p  =  Ic^Rj-Bui,;  is  large.  Since  Bf7 C  =  O, 
the  only  way  to  maximize  p  is  to  retain  as  much  MAI  and  NBI  as 
possible.  This  in  turn  suggests  that  a  suitable  method  for  imple¬ 
menting  the  PA  receiver  is  to  find  a  reduced  size  B  that  can  retain 
as  much  MAI  and  NBI  as  possible. 

In  a  multi-user  scenario,  the  MAI’s  effective  signatures  can 
be  obtained  by  pilot  symbol-assisted  channel  estimation  and  ex¬ 
ploiting  the  corresponding  spreading  sequences.  In  particular,  the 
effective  signature  vector  of  a  user  observed  at  the  receiver  is  given 
by  the  convolution  of  the  spreading  codes  with  the  FIR  channel  re¬ 
sponse.  Accordingly,  from  (4),  the  estimated  ( M  +  L  -  1)  x  1 
composite  signature  vector  (CSV)  of  user  i  can  be  expressed  as 

L 

h;  =  y^&j'tcTil  i  =  2,...,K  (12) 

f=i 

with  i  =  2, . . . ,  K,  where  d;,;  is  obtained  by  pilot  symbol-assisted 
channel  estimation  at  the  7th  finger.  Given  these  CSV  estimates,  a 
reduced  size  blocking  matrix  Bim  can  be  constructed  by  project¬ 
ing  onto  the  column  space  of  B  the  set  of  vectors  {hi }  readily 
obtained  in  a  multi-user  scenario.  Therefore,  the  new  blocking 
matrix  of  the  PA  GSC  for  MAI  suppression  can  be  obtained  as 

Bim  =  BB77  [h2  •  •  •  h/cj  (13) 

On  the  other  hand,  the  NBI  signatures  can  be  obtained  in  over¬ 
lay  applications.  In  particular,  a  narrowband  linearly  modulated 
signal  can  be  treated  as  a  group  of  related,  virtual  spread  spectrum 
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signals  with  simple  spreading  codes  [5].  Therefore,  the  NBI's  ef¬ 
fective  signatures  can  be  obtained  by  pre-estimating  and  exploiting 
these  virtual  spreading  codes.  Given  these  CSV  estimates  of  NBI. 
a  reduced  size  blocking  matrix  Bi„  can  be  constructed  by  pro¬ 
jecting  onto  the  column  space  of  B  the  set  of  virtual  spreading 
codes,  {g,},  i  =  1, . . . ,  La'/+q~1J  +  1.  Therefore,  the  reduced 
size  blocking  matrix  of  the  PA  GSC  for  NBI  suppression  is  given 
by 

Bln  =  BBJ/  [gl  ■  •  •  g^Af  +  r.-l  J  )  ,]  (14) 

For  joint  MAI  and  NBI  supression,  it  is  straightforward  to  con¬ 
struct  the  new  blocking  matrix  Bip  =  [Bi,„ .  B>„J .  Note  that 
Bip  can  be  regarded  as  the  “smallest”  blocking  matrix  with  the 
number  of  columns  equal  to  the  number  of  “real”  MAI  and  “vir¬ 
tual”  MAI  (due  to  NBI). 

3.3.  GSC  Structure  for  Time- Varying  NBI 

Consider  again  the  NBI  in  (6).  It  is  noteworthy  that  if  G  =  T /Ti 
is  an  integer,  then  the  NBI  can  be  treated  as  G  MAPs  which  can 
be  suppressed  with  a  time-invariant  receiver.  However,  if  the  ratio 
G  is  not  an  integer,  the  detection  rule  would  be  time-varying.  In 
general,  it  is  plausible  to  assume  that  the  ratio  G  is  a  rational  num¬ 
ber  [3]  such  that  the  NBI  sequence  is  a  periodically  time-varying 
one  with  a  period  equal  to  CT ,  where  C  is  the  smallest  integer 
such  that  CT  is  an  integer  multiple  of  T/ .  In  other  words,  the  NBI 
can  be  decomposed  into  C  time-invariant  parts,  each  representing 
G  MAI's.  This  fact  in  turn  implies  that  the  data  correlation  matrix 
R,  is  itself  periodically  time-varying  with  period  C,  thus  resulting 
in  the  following  periodically  time-varying  GSC  processing: 

wu(m)  =  [I  -  B(BwRJ.(m)B)'1B,,R.r(m)]c,.,  (15) 

for  m.=  1, . . . ,  C,  where  wi,i  (rra)  is  the  weight  vector  of  the  mth 
part  and  Rx  (m)  is  the  correlation  matrix  constructed  by  collecting 
data  samples  over  the  ( qC  +  m)th  symbols,  q  =  0, 1,  2, . . ..  As 
a  result,  C  different  sets  of  correlator  weights  should  be  designed, 
with  each  set  corresponding  to  a  time-invariant  component  of  the 
NBI.  As  shown  in  Figure  1,  the  input  data  is  fed  into  the  bank  of 
C  correlators,  processed,  and  then  combined  back  to  a  serial  data 
stream  corresponding  to  a  finger. 

3.4.  RAKE  Combining  and  Decision  Aided  Signal  Reconstruc¬ 
tion 

With  the  time-varying  PA  correlator  bank  constructed,  the  next 
step  is  to  perform  a  maximum  ratio  combining  of  the  correlator 
outputs  to  collect  the  multipath  energy.  Since  the  MAI  and  NBI 
have  been  removed,  channel  estimation  for  the  desired  user  can  be 
done  accurately,  leading  to  improved  performance  as  compared  to 
the  conventional  RAKE  receiver.  However,  the  GSC  is  blind  in 
nature  and  usually  exhibits  slow  convergence  due  to  the  residual 
signal  effect.  To  remedy  this,  a  decision  aided  scheme  is  intro¬ 
duced  in  which  the  signal  is  estimated  and  then  subtracted  from 
the  input  data  before  the  computation  of  GSC  adaptive  weights. 
First,  assume  that  at  the  jth  iteration,  the  mth  part  of  the  received 
data,  x(m,  t),  is  available  and  despread  into: 

z[jJ  (m,  i)  =  Wy  (m)  x(m,i)  (16) 


where  w is  estimated  by  (15),  but  using  the  "signal-subtracted” 

data  y(j)(m,  ?)  as  the  input.  With  z[Jj (m,  i)  available,  we  can  ob¬ 
tain  the  channel  estimate  using  a  sequence  of  Np  pilot  symbols: 

1  N’’ 

=  ^-£5uW)  07) 

,=1 

With  the  channel  estimate  (in),  the  random  phase  of  the  /th 
finger  output  i )  is  removed  and  coherent  RAKE  combin¬ 

ing  is  achieved  by 

/, 

z\’)(m,i)  =  '%jTia[3)  i)  (18) 

(=i 

which  is  then  sent  to  the  data  decision  device: 

b\J\m,i)  —  de.c{z[j)  (m,i)} 

b\J\k)  =  P/S[fojj)  (m,  j)]  k  =  (i  -  1  )C  +  m  (19) 

where  P/S  is  parallel  to  serial  transform.  Second,  signal  recon¬ 
struction  is  done  by  exploiting  the  channel  estimate  a[jJ  (?»),  the 
desired  user's  signature  Ci ,/  and  data  decisions  b\J  >  (i)  as  follows: 

L 

s\m'j\i)  =  b\j)(u>.  i)  ^  (20) 

;=i 

Finally,  the  reconstructed  signal  is  subtracted  from  the  data  sent  to 
the  j  -F  1th  iteration,  which  yields 

y0+1)(m, i)  =  x(rn, i)  -  s[j)(m,i)  (21) 

By  using  y0+1)(m,i)  as  the  GSC  input,  the  adverse  slow  conver¬ 
gence  can  be  effectively  improved.  This  above  described  proce¬ 
dure  can  be  iterated  several  times  to  gain  further  improvement. 

4.  COMPUTER  SIMULATIONS 

As  a  performance  index,  we  define  the  output  SINR  to  be  the  ra¬ 
tio  of  the  signal  power  to  the  interference-plus-noise  power  at  the 
receiver  output.  The  input  SNR  is  defined  to  be  E{\d\  (f)|2}/rr2 
and  the  near-far-ratio  (NFR)  is  the  ratio  of  the  MAI  power  to  sig¬ 
nal  power  before  despreading.  The  path  gains  o^-j's  are  assumed 
independent,  identically  distributed  unit  variance  complex  Gaus¬ 
sian  random  variables,  the  path  delays  nj's  are  assumed  uniform 
over  [0,  3TC],  and  the  number  of  paths  was  L  =  4  for  all  users. 
All  simulations  involved  K  equal  power  CDMA  signals  spread  by 
the  Gold  code  of  length  3 1  and  two  equal  power  BPSK  NBI’s  with 
Ti  =  12 Tc,  C  =  6  (6  x  (31  +  4  -  1)  =  17  x  12)  and  the  ratio  of 
the  signal  power  to  NBI  power  was  -20  dB.  The  number  of  fin¬ 
gers  was  L  =  4.  and  the  input  SNR  was  0  dB.  The  PA  dimension 
was  chosen  to  be  P  —  (K  —  1)  +  4.  and  each  result  was  obtained 
by  100  independent  trials.  For  comparison,  we  also  included  the 
results  obtained  with  the  MMSE  [1],  RAKE  and  proposed  FA  re¬ 
ceivers.  For  all  receivers,  Np  =  300  pilot  symbols  were  used  for 
correlator  weight  vectors  computation  and  channel  estimation. 

In  Figure  2,  with  K  =  10  and  NFR  =  10  dB,  it  is  observed 
that  the  proposed  PA  receiver  converges  in  three  iterations,  out¬ 
performs  the  MMSE  receiver,  and  reaches  the  performance  of  the 
FA  receiver.  The  system  capacity  is  then  evaluated  in  Figure  3 
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with  NFR  =  10  dB.  Again,  the  proposed  receiver  outperforms  the 
MMSE  receiver  and  gives  the  performance  of  the  FA  receiver.  Fi¬ 
nally,  the  near-far  resistance  is  evaluated  in  Figure  4  with  K  =  10. 
As  observed,  the  proposed  receiver  achieves  its  excellent  near-far 
resistance  by  successfully  rejecting  the  MAI  and  NBI. 

5.  CONCLUSION 

A  decision  aided  receiver  with  partially  adaptive  interference  sup¬ 
pression  has  been  proposed.  It  is  designed  with  the  following  pro¬ 
cedure.  First,  a  set  of  adaptive  correlators  implemented  in  the  form 
of  GSC  is  constructed  to  collect  multipath  signals,  suppress  strong 
MAI  and  time-varying  NBI.  In  particular,  partial  adaptivity  is  in¬ 
corporated  into  the  GSC  for  reduced  complexity  which  is  achieved 
by  selecting  a  reduced  dimension  subspace  of  the  column  space  of 
the  blocking  matrix  encompassing  the  MAI  and  NBI  signatures. 
Next,  a  simple  maximum  ratio  combiner  gives  the  symbol  deci¬ 
sions.  In  order  to  enhance  the  convergence  performance  of  the 
adaptive  correlators,  a  decision  aided  scheme  is  introduced  which 
subtracts  the  reconstructed  signal  from  the  input  of  the  GSC.  The 
proposed  CDMA  receiver  is  evaluated  through  simulations,  and 
the  results  show  that  it  can  outperform  the  conventional  MMSE 
receiver  designed  for  time-invariant  interference. 
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Figure  1:  Structure  of  proposed  CDMA  receiver  with  partially 
adaptive  interference  suppression 
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Figure  2:  Output  SINR  versus  iteration  number  with  K  =  10, 
NFR  =  10  dB  and  SNR  =  0  dB 


Figure  3:  Output  SINR  versus  user  number  with  NFR  =  10  dB  and 
SNR=  0  dB 
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Figure  4:  Output  SINR  versus  NFR  with  jFf  =  10  and  SNR  =  0  dB 
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ABSTRACT 

This  paper  introduces  an  Iterative  Space-time  Soft  Estimator 
(ISSE)  that  jointly  performs  linear  channel  estimation  and  soft 
data  detection  in  time-varying  Multiple-Input  Multiple-Output 
(MIMO)  channels,  both  according  to  the  Minimum  Mean  Squared 
Error  (MMSE)  criterion.  We  introduce  a  scalar  approximation  of 
the  channel  autocovariance  function  that  relies  on  the  statistical 
homogeneity  of  the  multipath  scattering  and  allows  to  derive  a 
globally-convergent  burst-adaptive  algorithm  to  estimate  and  track 
the  channel  statistics. 

1.  INTRODUCTION 

It  has  been  recently  demostrated  that  deploying  multiple 
transmitting  and  receiving  antennae  in  a  wireless  link  can  lead  to 
a  significant  capacity  increase  as  long  as  multipath  propagation  is 
adequately  exploited  [1].  Further  combination  of  vector  coding 
techniques  with  signal  processing  methods  at  the  receiver  has  led 
to  the  development  of  the  so-called  Space-Time  Coding  (STC) 
concept  [2], 

A  relevant  signal  processing  problem  for  the  implementation 
of  STC  systems  is  the  estimation  of  the  Multiple  Input  Multiple 
Output  (MIMO)  channel.  In  a  practical  environment,  the  MIMO 
channel  should  be  considered  both  as  time-dispersive,  leading  to 
severe  Inter-Symbol  Interference  (ISI),  and  time-varying.  The 
conventional  approach  to  address  the  channel  variability  is  to 
use  adaptive  algorithms  such  as  Least  Mean  Squares  (LMS)  and 
Recursive  Least  Squares  (RLS)  [3].  However,  it  has  been  shown 
[4,  5]  that  taking  explicitly  into  account  the  time-varying  nature  of 
the  channel  leads  to  an  improved  performance.  The  most  common 
approach  is  to  use  a  block-adaptive  procedure  [4,  5]  that  consists 
of  computing  a  set  of  snapshot  estimates  of  the  channel  in  different 
(e.g.,  equally  spaced)  observation  windows  using  training  data 
and,  then,  interpolating  the  channel  coefficients  between  succesive 
snapshots. 

In  this  paper,  we  propose  a  burst-iterative  space-time  scheme 
that  alternates  channel  estimation  and  data  detection.  A  linear 
channel  estimator  is  derived,  according  to  the  Minimum  Mean 
Squared  Error  (MMSE)  criterion,  that  involves  the  second  order 
statistics  of  the  random  channel  process.  Under  a  homogeneous 
scattering  assumption,  we  show  that  the  channel  autocovariance 
can  be  characterized  by  a  single  scalar  function,  and  we  derive 
a  globally-convergent  burst-adaptive  algorithm  to  estimate  it. 

This  work  has  been  supported  by  FEDER  funds  (1FD97-0082)  and 
Xunta  de  Galicia  (PGIDT00PXI10504PR). 


Although  the  latter  result  does  not  exactly  hold  for  arbitrary 
scattering  models,  computer  simulations  show  that  there  is  only 
a  slight  performance  loss,  whereas  a  significant  reduction  in 
information  requirements  (i.e„  knowledge  of  the  full  channel 
autocovariance  matrices)  is  achieved.  Since  the  linear  channel 
estimator  also  depends  on  the  transmitted  symbols,  we  propose 
to  iteratively  alternate  channel  estimation  and  data  detection 
until  convergence  and  show  that  this  approach  achieves  better 
performance  than  existing  block-adaptive  channel  estimation 
methods. 

In  the  next  section,  the  system  and  signal  model  are 
introduced.  The  channel  estimator  is  derived  in  section  3  and 
data  detection  is  considered  in  section  4.  Illustrative  computer 
simulations  are  shown  in  section  5  and  section  6  is  devoted  to  the 
conclusions. 


2.  SYSTEM  AND  SIGNAL  MODEL 

Let  us  consider  a  wireless  communication  system  with  N  antennae 
at  the  transmitter  and  L  antennae  at  the  receiver.  The  block 
diagram  of  such  a  system  is  depicted  in  figure  1 .  The  information 
bits  to  be  transmitted.  {6(/)},_ 0,i,2...’  816  int0  a  channel 
encoder  and  interleaver  to  yield  a  coded  bit  sequence.  A 
Serial  to  Parallel  (S/P)  converter  followed  by  a  bank  of  N 
Waveform  Encoders  (WE)  and  transmitting  antennae  transforms 
this  sequence  into  the  identically-modulated  information-bearing 
signals,  si(t), ...,  s/v(f).  Transmission  is  carried  out  in  bursts 
of  ArA'log,(.4)  bits,  i.e„  K  complex  symbols  per  transmitting 
element,  assuming  that  the  WE's  use  a  keying  format  with  log2  ( A) 
bits  per  symbol.  Multipath  propagation  occurs  betweeen  each 
pair  of  transmitting  and  receiving  antennae,  resulting  in  a  time- 
dispersive  MIMO  channel.  At  the  receiver,  a  bank  of  L  Matched 
Filters  (MF)  sampled  at  the  symbol  rate,  1  /T,  are  used  to  obtain 
Lx  1  vectors  of  observations,  x(n)  =  [zi  (n), ...,  xi,  (n)]T  ,  n  = 
0,1,...,  K  -  1.  These  observations  are  sufficient  statistics 
for  an  Iterative  Space-time  Soft  Estimator  (ISSE)  device  to 
obtain  estimates,  y(n)  =  [t/i(n), ...,  ?/A’(r0]7 ,  n  = 

0.1,...,  K  -  1,  of  the  complex  transmitted  symbols  s(n)  — 
[si(n),...,s/v(”)]T>  n  =  0,1,...,  K  -  1.  A  Parallel  to  Serial 
(P/S)  converter  produces  the  one -dimensional  symbol  sequence 
{y(Uh.=o  1  2  nk-1  tllat  *s  processed  by  a  deinterleaver 
and  channel  decoder  to  obtain  the  final  hard  estimates  of  the 

information  bits,  { b(l)  j , 

I  x  '  J  1=0,1.. ...Nh  log2(V) 

When  a  linear  memoryless  keying  format  for  the  WE’s  is 
employed,  the  following  discrete-time  signal  model  is  obtained  for 
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Fig.  1.  Discrete-time  model  of  a  multiaccess  wireless  communication  system. 


the  observations 

m— 1 

x(n)  =  fii(n)l(n  “  0  +  g(n)  =  H(n)s(n)  +  g(n) 

1=0 

where  s(n)  is  the  n-th  transmitted  symbol  vector,  s(n)  =  [sT(n  — 
m  +  1)  •  •  •  sT  (n)]r  is  the  n-th  Nm  x  1  received  symbol 

vector  (note  the  existence  of  causal  ISI),  m  is  the  maximum  length 
of  the  time-varying  discrete  channel  impulse  response,  g(n)  is 
a  L  x  1  vector  of  independent  and  identically  distributed  (i.i.d.) 
complex  Gaussian  components  with  zero-mean  and  variance  <7?, 
and 

H(n)=  [i£m_a(n)  H^(n)] 

is  the  n-th  realization  of  the  L  x  Nm  random  channel  matrix.  If 
we  let  vector 

hy  (n)  =  [h'}j(n),h}j(n),...,h™j-1(n)]T 

denote  the  (time-varying)  channel  impulse  response  between  the 
j-th  and  the  i-th  transmitting  and  receiving  elements,  respectively, 
we  can  write  down  the  L  x  N  components  of  7L,(n),  l  = 
0, ...,  m  —  1  as 


hu(n) 

hn(n)  . 

..  h‘1N(n) 

Ht(n)  = 

h‘21(n) 

hl22(n)  . 

hl2N(n) 

.  hlL1(n) 

hlL2(n)  .. 

■  ■  hlLN(n)  _ 

According  to  the  Gaussian  Wide  Sense  Stationary  Uncorrelated 
Scattering  (GWSSUS)  model,  the  channel  coefficients  in  H(n) 
are  assumed  to  be  stationary  independent  Gaussian  random 
processes  with  zero-mean,  variance  ct2,  and  autocovariance 

*3 

function  4>ij{k)  =  E  [/i- *  (n)] ,  where  superindex  * 
denotes  complex  conjugation. 


3.  LINEAR  MMSE  CHANNEL  ESTIMATION 


The  Unear  MMSE  estimator  for  the  channel  coefficients  at  time  n 
is  obtained  by  processing  a  window  of  M  +  1  observation  vectors 
(even  M)  as 


MSE(W,  i,  n) 
W  (*,  n) 
H  (i,n) 


E  [Trace  [||X(j)W  -  H(n)||2]] 

arg  min  {MSE(W,  i,  n)}  (1) 

w 

X(i)W(i,n)l  (2) 


where  £)[■]  denotes  statistical  expectation,  ||M||2  =  MWM  for 
an  arbitrary  matrix  M  and  superdindex  H  meaning  Hermitian 
transposition,  X(i)  =  [x(i  -  -y )  •  ■  ■  x(i  +  y)|  is  the  L  x 

(M  +  1)  observation  matrix  used  to  estimate  H(n),  MSE(-,  n) 
is  the  associated  mean  squared-error  cost  funtion,  W (i,  n)  is  the 
(M  +  1)  x  Nm  Linear  MMSE  (LMMSE)  filter  and  H (i,n)  is  the 
MIMO  channel  estimate.  Notice  that  there  are  several  optimum 
filters  for  estimating  H(n),  depending  on  the  observation  window 
that  is  used  (hence  the  two  indices,  i  and  n).  Although  for  most 
practical  channels,  the  best  performance  is  achieved  for  n  =  i  we 
will  consider  the  more  general  case  of  computing  different  channel 
estimates  using  the  same  window. 

Problem  (1)  is  purely  quadratic  and  presents  a  closed-form 
solution 

W  (t,n)  =  (E[XH(i)X(i)])_1£[XH(i)H(n)] 

=  (Tl(i)  +  LctglM+i)  V(i,n).  (3) 

In  the  above  expression.  7 Z(i)  is  the  (M  + 1)  x  (M  + 1)  noiseless 
autocorrelation  matrix,  Im+i  is  the  (M  +  1)  x  (M  +  1)  identity 
matrix  and  V(i,  n)  is  the  (M  + 1)  x  Nm  cross  covariance  matrix. 
The  element  in  the  r-th  row  and  c-th  column  of  7 Z(i)  turns  out  to 
be 

m)\r,c  =  (*  -  y  +  r)RW  (r  -  c)s(i  -  y  +  c), 

where  r,c  =  0, ...,  M  and 

RH(k)  =  E[HH(n)K(n  +  k)] 

is  the  MIMO  channel  autocovariance  Nm  x  Nm  matrix. 
Similarly,  the  r-th  row  vector  of  V(i,  n)  is 

P  (*>«)],.  =  sH(i-y  +r)R(|n-t+  y-r|),  r  =  0,...,M. 

Since  the  channel  coefficients  are  statistically  independent, 
the  autocovariance  matrix  can  be  further  decomposed  up  to  the 
diagonal  form 

L 

R  »(*)  =  ^diag^r1^),.--,^-1^),-.- 

i=  1 

-A0ii{k),-A°iN{k)}  .  (4) 

Moreover,  assuming  a  statistically  homogeneous  scattering  we  can 
state  that 

L 

P(k)  =  y>j,.(fc)  vj,/,  (5> 

i= 1 
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and  simplify  (4)  as 


4.  DATA  DETECTION 


R-N  (^;)  —  P  ( ^' ) I A  ns 


3.1.  Block-Adaptive  Estimation  of  the  Channel  Statistics 


Under  assumption  (5),  the  time-varying  channel  estimation  filter 
given  by  (3)  is  fully  characterized  by  the  single  scalar  function 
p(k).  We  show,  in  this  section,  that  p{k)  can  be  estimated  by 
means  of  the  globally-convergent  burst-adaptive  updating  rule 


Pi{k)  = 

Pi(k)  = 


K  —  A'  —  1  xH(;l  )x(n+k)-6(k)Lrr-j 
s 11  (n)s(n+k) 

(1  -ju)pi-i(fc)  +  ppi(k) 


(6) 

(7) 


where  x(0),  ...,x(A"  -  1)  is  the  i-th  burst  of  observations,  with 
associated  data  vectors  s(0), ...,  s(A'  —  1),  0  <  p  <  1  is 
the  step-size  parameter,  5(-)  is  Kronecker’s  delta  function  and 
I(n)  =  1-5  [sH  (n)s(n  +  k ))  is  an  indicator  function  that  avoids 
divisions  by  zero.  The  validity  of  algorithm  (6)-(7)  is  granted  by 
the  statistical  results  stated  below. 

Hypotheses:  Let  x(tj)  =  H(n)s(n)  +  g (n)  be  a  correlated 
stochastic  process  such  that 

(i)  H(n)  is  a  random  ( L  x  TVmJ-dimensional  stationary  process 
with  autocovariance  matrices 


E  [Hw(n)H(7).  +  /c)]  =  p(fc)W 

(ii)  s(n)  are  vectors  of  deterministic  symbols,  that  belong  to  a 
finite  alphabet  with  constant-modulus  and  finite-valued  elements. 

(iii)  For  the  desired  delay  k ,  sH  (n)s(rj  +  k)  /  0,  Vn. 

(iv)  g(rt)  is  a  temporally  white  Gaussian  process  with  moments 
E  [g(n)]  =  0  and  E  [gw(n)g(n  +  k)]  =  h(k)o2gh_. 

Lemma  1  Under  assumptions  (i)-(iv), 

_  1  '  xH(n)x(n  +  k)  -  S(k)LtTg 

^  K  ■2-'  sH  (n)s(n  +  k) 


The  linear  channel  estimator  (3)  depends  on  the  transmitted 
symbols  and,  therefore,  data  detection  and  channel  estimation 
should  be  carried  out  jointly.  In  this  section,  we  describe  an  ISSE 
scheme  that  alternates  LMMSE  channel  estimation  and  Decision 
Feedback  (DF)  MMSE  soft  data  detection.  In  order  to  describe  the 
DFMMSE  detector,  it  is  convenient  to  define  the  following  stacked 
model  for  the  received  signal, 

x„(jj)  =  H„  (n)s„  (n)  +  g„(rc), 
where  a  is  a  positive  integer  factor  and 


Xo(n)  =  [ 

xT(«) 

xT  (7;  +  a  — 

I)]" 

r 

(71)  0 

0 

Km-, 

in)  • 

0 

Km- 2(n) 

H„  = 

K0( 

n)  : 

:  IL.-M 

0 

Zo  (n) 

Zw_2(n) 

0 

0 

■■  £o(n) 

s «(»»)  =  [ 

/(«  — 

777  +  1)  •  •  •  .S7  ( 

77  +  0-  1)] 7 

gain)  =  [ 

gr(») 

"•  g^  (77  +  77 

d]Lx, 

are  the  Ln  x  1 

stacked 

observation  vector. . 

La  x  N(  in  +  a 

channel  matrix,  N(m  +  a  -  1)  x  1  received  symbol  vector  and 
La  x  1  AWGN  vector,  respectively. 

Symbol  estimates  are  computed  as 

y{i,n)  =  F "(i,n)x„(i)  +  BH (i,  n)sn(i,  n)  (8) 


is  an  estimator  of  p(k)  with  the  following  properties 

(i)  p(k)  is  asymptotically  unbiassed,  i.e.,  lim  E[p(k)}  = 

K  — >  oc 

p{k)  \/k 

(ii)  p(k  7^  0)  is  asymptotically  consistent,  i.e., 

lim  Var  [p{k)]  =0  k  /  0,  where  V'ar[-]  denotes  variance, 

A— voo 

and 

(iii)  p(0)  is  asymptotically  consistent  for  high  values  of  the  SNR, 
i.e.,  lim  V ar  [p(0)]  =  0  where  7  is  the  signal-to-noise  ratio  in 

A', 7— >-oo 

natural  units. 

Theorem  1  Under  assumptions  (i)-(iv),  the  adaptive  rule  (7) 
yields  an  estimate,  pfk),  of  the  channel  autocovariance  function 
that  verifies 

(i)  lim;, k_»oo  E  [p{k)}  =  p(k)  Vfc, 

(ii)  lim, ,/woo  Var  [p(E)]  =0,  k  f-  0,  and 
(nV)  lim, :,/c, 7-4oo  Var  [p(0)]  =  0. 

The  proofs  are  necessarily  skipped  due  to  lack  of  space, 
but  they  can  be  checked  in  [6],  The  above  results  state  that 
the  correlation  features  of  the  stationary  channel  process,  which 
are  required  in  order  to  build  the  matrix  filter  (3),  can  be 
adequately  estimated  from  the  available  observations  as  long  as 
the  observation  interval  is  large  enough. 


where  F (i,  n)  and  B(i,n)  are  forward  and  backward  matrix 
filters  with  dimensions  La  x  N  and  (n  +  m  —  i  —  1)  x  La, 
respectively,  and  s/?(i,  n)  =  [sT(i  —  m  +  1)  •  ■  ■  sT {n  —  1)] 
is  a  N  x  (n  +  m  —  i  —  1)  vector  containing  past  hard  symbol 
estimates  provided  by  a  simple  scalar  threshold  detector.  Note 
that,  in  order  to  estimate  s(n),  index  i  must  be  in  the  range 
n  —  a  <  i  <  n  +  m.  The  forward  and  backward  linear  filters, 
F(i.n)  and  B (i,n).  respectively,  are  selected  according  to  the 
MMSE  criterion  as 

F(t,7i),B(t,n)  =  arg  min  {E  [|||/(*,n)  -  s(n)||2]  }  . 

Assuming  that  the  transmitted  symbols  are  temporally 
uncorrelated  and  statistically  independent  of  the  AWGN,  it  is 
straightforward  to  derive  closed-form  solutions  for  F(i,n)  and 
B(i,  n)  using  the  available  channel  estimates.  H(n)  [6], 

The  proposed  ISSE  performs  joint  channel  estimation  and  data 
detection  by  iterating  equations  (3),  (2)  and  (8)  until  convergence 
for  each  data  burst.  The  availability  of  training  data  to  obtain 
initial  channel  estimates  is  assumed.  As  data  bursts  are  processed, 
algorithm  (6)-(7)  is  used  to  estimate  and  track  the  channel  process 
statistics. 
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Fig.  2.  (a)  MSE  improvement  of  LMMSE  channel  estimation  with  respect  to  BALS  channel  estimation  as  p(k)  is  estimated.  SNR=21  dB. 
(b)  Channel  estimation  MSE  for  several  values  of  the  SNR,  (c)  Symbol  estimation  MSE  for  several  values  of  the  SNR.  The  forward  and 
backward  filters  are  F(n  -  2,  n)  and  B (n  -  2,  n),  respectively,  with  a  =  6. 


5.  COMPUTER  SIMULATIONS 

Let  us  consider  a  system  with  N  =  L  =  3  transmitting  and 
receiving  antennae,  BPSK  modulation  and  rectangular  pulses. 
The  burst  length,  per  antenna,  is  K  =  800  and  the  symbol 
period  is  T  =  4  ps.  16%  equally-spaced  pilot  symbols  are 
transmitted  from  every  antenna.  We  assume  a  land  mobile 
communication  environment  with  the  classical  Rayleigh  model 
of  the  power  Doppler  spectrum,  which  yields  the  autocovariance 
function,  <p\j{k)  =  <r2,  Jo  ( 2-KfDkT ) .  Here,  Jo(-)  is  the  zero- 

ij 

order  Bessel  function  of  the  first  kind,  fo  =  ~fc  is  the 
maximum  Doppler  spread,  vm  =  120  Km/h  is  the  motion 
speed  of  the  transmitter,  vi  is  the  speed  of  light  and  fc  = 
2  GHz  is  the  carrier  frequency.  Using  this  model,  we  have 
simulated  the  type-B  multipath  channel  defined  by  IMT-2000  for 
the  vehicular  environment  [7]  with  4  ps  root  mean-squared  delay 
and  a  decreasing  exponential  delay  power  profile  that  yields  a 
16  ps  maximum  delay  spread  and  significant  ISI  (m  =  4). 
The  homogeneous  scattering  assumption  (5)  does  not  hold  in  the 
described  environment,  but  the  simulation  results  show  that  the 
scalar  autocovariance  approximation,  Tlir(k)  «  p(fc)Lvm,  still 
provides  an  adequate  performance. 

We  compare  the  performance  of  the  proposed  ISSE  with  a 
combination  of  Block-Adaptive  Least  Squares  (BALS)  channel 
estimation  [4,  5]  and  DFMMSE  data  detection.  The  performance 
limits,  i.e.,  the  MMSE  for  channel  and  symbol  estimation,  are 
also  plotted  as  a  reference.  The  K  =  800  symbol  frame  is 
divided  into  4  windows  that  only  overlap  in  one  symbol  (hence, 
M  =  200).  Both  LMMSE  and  BALS  channel  estimators  are 
applied  on  these  windows,  so  they  have  the  same  computational 
complexity.  Indeed,  with  this  setup,  using  BALS  is  the  same  as 
applying  LMMSE  with  constant  p(k). 

Figure  2(a)  shows  the  Mean  Squared  Error  (MSE) 
improvement  of  the  proposed  LMMSE  channel  estimator  over 
the  BALS  estimator  as  p(k)  is  adaptively  estimated,  starting  with 
constant  p(k)  =  1  VA.  The  MSE  gain,  in  dB,  of  a  sequence 
of  channel  estimates  {  Ho(n)}n_0  over  another  sequence 

{Hi(n)}n=o . K_i  is  defined  as  e0,i  =  log10  (§£)  where 

£i  =  j(  Yl'i'=o  Trace  [||H,  —  H||2]  is  the  average  channel  MSE. 
It  can  be  seen  that  the  LMMSE  estimates  may  attain  up  to  a  15  dB 
improvement  over  the  BALS  estimates,  when  the  Signal  to  Noise 
Ratio  (SNR)  value  is  21  dB. 

This  improvement  can  also  be  observed  in  figure  2(b),  which 
shows  the  channel  MSE  (after  estimation  of  p(A))  for  the  iterative 


LMMSE  and  BALS  methods,  together  with  the  MSE  of  the  initial 
estimates  (obtained  via  non  iterative  BALS  using  training  data) 
and  the  correponding  MMSE.  The  performance  of  the  iterative 
BALS  estimator  degrades  for  higher  SNR  values  due  to  severe 
error  propagation. 

Finally,  figure  2(c)  shows  the  symbol  MSE  attained  by 
the  ISSEs  for  several  values  of  the  SNR.  It  is  observed  that 
the  proposed  iterative  method  that  combines  LMMSE  channel 
estimation  and  DFMMSE  symbol  detection  practically  attains  the 
MMSE. 

6.  CONCLUSIONS 

We  have  introduced  an  ISSE  structure  that  performs  joint  LMMSE 
channel  estimation  and  soft  DFMMSE  data  detection  in  time- 
varying  MIMO  channels  with  ISI.  A  scalar  approximation  of 
the  channel  autocovariance  function  that  relies  on  the  statistical 
homogeneity  of  the  multipath  scattering  is  proposed,  together  with 
a  globally-convergent  burst-adaptive  algorithm  to  estimate  it. 
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ABSTRACT 

Chi  and  Chen  recently  reported  a  blind  equalization  algo¬ 
rithm  using  cumulant  based  multi-input,  multi-output  in¬ 
verse  filter  criteria  (MIMO-IFC)  for  mulituser  DS/CDMA 
systems  in  multipath.  Assuming  that  the  user  of  interest 
is  the  weak  user  with  signal  power  E\  and  signal  powers  of 
all  the  interferers  are  identical,  denoted  £,  the  performance 
of  Chi  and  Chen’s  algorithm  is  superior  to  that  of  Tsatsa- 
nis  and  Xu’s  blind  minimum  variance  (MV)  equalizer  for 
low  near-far  ratio  (NFR)  (=  £/£  1  >  1).  In  this  paper,  two 
blind  equalization  algorithms,  called  Algorithms  2  and  3. 
also  using  cumulant  based  MIMO-IFC  are  proposed.  The 
former  (Algorithm  2)  can  improve  the  performance  of  the 
MV  equalizer.  The  latter  (Algorithm  3)  based  on  the  former 
performs  as  well  as  Chi  and  Chen’s  algorithm  for  low  NFR 
and  outperforms  Chi  and  Chen’s  algorithm  and  the  MV 
equalizer  for  high  NFR.  Some  simulation  results  are  pre¬ 
sented  to  support  the  efficacy  of  the  proposed  algorithms. 

1.  INTRODUCTION 

Blind  equalization  of  a  multi-input  multi-output  (MIMO) 
linear  time-invariant  (LTI)  system,  denoted  H[n]  ( P  x  K 
matrix),  is  a  problem  of  estimating  the  vector  input  u[n]  = 
(rti[n],  m2 [n], ...,  ui<[n])T  ( K  inputs)  with  only  a  set  of  non- 
Gaussian  vector  output  measurements  x[n]  =  (i.'i  [n],  rrafn], 
...,  xp[n])T  ( P  outputs)  as  follows  [1-4] 

oo  A 

x[n]  =  H[fe]u[n  -  k]  +  w[n]  =  ^  yA-[n]  +  w[n]  (1) 

fc=  — oo  A’  =  l 

where  w[n]  =  (wi[n],W2 [n],...,»p[n])T  (P  x  1  vector)  is 
additive  noise  and 

OO 

yt-D]  =  tu.[n]  *  Uk\n]  =  ^2  hr- [n  —  /]  •  uu [/]  (2) 

l  —  —  OO 
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(the  contribution  in  x[r;]  from  the  input  «*■[/)])  in  which 
h*. [n]  is  the  kth  column  of  H[n].  Blind  equalization  of 
MIMO  systems  in  multiuser  detection  of  wireless  communi¬ 
cations  includes  suppression  of  multiple  access  interference 
(MAI)  and  removal  of  multipath  effects  that  are  crucial  to 
the  receiver  design  of  multiuser  communications  systems. 

Let  v[?i]  =  (tq[n],t>2[n],  ...,«p[n])7  denote  a  linear  FIR. 
equalizer  of  length  L  =  L2  -  L,  +  1  for  which  v[n]  /  0 
for  n  =  L\ ,  L\  +1, ...,  Li.  Then  the  output  e[ri.]  of  the  FIR 
equalizer  (inverse  filter)  v[n]  can  be  expressed  as 

l?  p 

e[n)  =  Y2  vT[U  '  x[w  -  k]  =  ^  v]  Xj  [?i]  =  u'  S[n]  (3) 
A-=/-l  2  =  1 

where  v,  -  {vj[L\\,Vj{L\  +  1], vj  [T2])T,  xj[n]  =  (:rj[n  - 
Li],  xj[n- Li  -1],  ...,Xj[n- L2])T ,  sb[m]  =  (xf  [n],  xl[n\, 
Xp[n])7  and 

u  =  (vf,i C,...,Vp)T. 

The  design  of  the  equalizer  v[n]  (or  v)  such  that  e[n]  — t 
auj0[n  -  t]  where  a  /  0,  t  is  an  unknown  integer  and 
uJ0  [«.]  is  the  signal  of  interest  (SOI),  is  a  widely  known 
signal  processing  problem  in  wireless  communications. 

2.  REVIEW  OF  MIMO  INVERSE  FILTER 
CRITERIA  (MIMO-IFC) 

For  ease  of  later  use,  let  cum{t/i ,  1/2, ...,  yv}  denote  the  pth- 
ordcr  cumulant  [5]  of  random  variables  j/i,  7/2,  Vp, 

cum {y:p, ...}  =  cum{?/i  =  y,  y2  =  y, -,y,,  =  y,  ■••} 

Cp.q{y}  =  cum  {y:p,y':q} 

where  if  is  complex  conjugate  of  y.  Assume  that  we  are 
given  a  set  of  measurements  x[n],  n  —  0,  1,  ...,  N  —  1, 
modeled  by  (1)  with  the  following  assumptions: 

(Al)  u.;[n]  is  zero-mean,  independent  identically  distributed 
(i.i.d.),  non-Gaussian  and  statistically  independent  of 
ua[t(]  for  all  k  yf  *,  and  Cp, ,{«,•[«]}  /  0,  *  =  1,2, ...,  K 
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for  a  chosen  ( p ,  g),  where  p  and  q  are  nonnegative  in¬ 
tegers  and  p  +  q  >  3. 

( A2 )  The  MIMO  system  H[n]  is  exponentially  stable. 

(-43)  The  noise  w[n]  is  zero-mean  Gaussian  (which  can  be 
spatially  correlated  and  temporally  colored)  and  sta¬ 
tistically  independent  of  u[n]. 

MIMO-IFC 

Chi  and  Chen  [1]  proposed  MIMO-IFC  for  the  design 

of  the  equalizer  v[n]  by  maximizing 


Therefore,  the  contribution  in  x[ti]  due  to  uj  [n]  can  be  es¬ 
timated  as  (see  (2)) 

ytH  =  hj[n]  *e[n].  (8) 

Removing  y^n]  from  the  data  x[n]  yields 

x[n]  =  x[n]  —  y,  [n\  —  x[n]  -  hj  [n]  *  e[n]  (9) 

that  corresponds  to  the  outputs  of  a  P  x  (A'  —  1)  system 
driven  by  (K  —  1)  inputs  tt;[ra],  i  =  1,  j  —  1 ,  j  +  1,  K. 


Jp,q( v) 


\CP,g{e[n]}\ 

|Ci,i{e[n]}|(p+9)/2 


(4) 


where  p  and  q  are  nonnegative  integers  and  p  +  q  >  3. 
The  obtained  optimum  e[n]  turns  out  to  be  an  estimate  of 
Uj[n] ,  j  €  {1,  2, ....  A'}  except  for  an  unknown  scale  fac¬ 
tor  and  an  unknown  time  delay.  Chi  and  Chen’s  MIMO- 
IFC  include  Tugnait’s  MIMO-IFC  [3]  for  (p.  q)  =  (2, 1)  and 
(p,  q)  —  (2, 2)  as  special  cases. 


Efficient  Algorithm  for  MIMO-IFC 

Recently,  Chi  and  Chen  [2]  proposed  a  fast  gradient  type 
iterative  MIMO-IFC  algorithm  with  convergence  speed,  com¬ 
putational  load,  and  amount  of  multichannel  intersymbol 
interference  (MISI)  similar  to  those  of  MIMO  super  expo¬ 
nential  algorithm  (MIMO-SEA)  [6]  as  follows: 


Algorithm  1.  Given  and  e(!_  1  * [n]  obtained  at 

the  ( i  —  l)th  iteration,  at  the  ith  iteration  is  obtained 
through  the  following  two  steps. 


(SI)  Obtain  by 


~-i  ~u- 1) 

„(0  _  H  d 


r  1  ■  5(i_1) 


(5) 


where  R  =  E[x*[n]xT[n^,  ||a||  denotes  the  Euclidean 
norm  of  vector  a  and 


dt  *  =  cum{e*!  1)[n]:r,(e^  ^[n])*  :  s  —  1, cc*[n]} 

(6) 


where  r  and  s— 1  are  nonnegative  integers,  r+s  =  p+q 
as  x[n]  is  real  and  r  =  s  =  p  =  q  as  x[n]  is  complex. 
(S2)  If  Jp, qiy^)  >  Tp,9(^1-1^),  go  to  the  next  iteration, 
otherwise  update  uil>  through  a  gradient  type  opti¬ 
mization  algorithm  such  that  Jp,q(y ^  )  > 
and  obtain  the  associated  e^[n]. 

Channel  Estimation  and  Signal  Cancellation 

With  the  obtained  e[n]  (estimate  of  Uj[n]  up  to  a  scale 
factor  and  a  time  delay  where  j  is  unknown)  using  Algo¬ 
rithm  1,  hj  [k]  can  be  estimated  as  [3] 


hj[fc] = 


E[x[n]e‘[n  —  fc]] 

^[|eH!2] 


(7) 


3.  MIMO  CHANNEL  MODELS  FOR 
MULTIUSER  DS/CDMA  SYSTEMS  IN 
MULTIPATH 

Consider  a  A'-user  asynchronous  DS/CDMA  system.  As¬ 
sume  that 

TZ  =  {ck[n],  k  =  1,2, ...,  K,  n  =  0, 1, ...,  P  -  1}  (10) 

is  the  set  of  the  K  active  users’  signature  sequences  (binary 
sequences  of  {+1,-1})  with  spreading  factor  equal  to  P  (> 
K).  Let  x[n]  and  w[n\  be  discrete-time  signals  by  sampling 
the  received  continuous  time  signal  x(t)  and  Gaussian  noise 
w(t)  with  sampling  interval  Tc  (chip  period),  respectively. 
Two  MIMO  models  are  considered  as  follows. 

MIMO  Model  I:  Polyphase  decomposition  [1,4] 

OO 

x(1)[n]  =  ^  H(1^[fc]u[n  —  k]  +  w^fn]  (11) 

k  =  —  DC 

where  x(1)[n]  =  (x[nP],  x[nP+l], ...,  x[nP+P-l])T,  w(1)[n] 
=  ( w[nP ],  w[nP  +  1], ...,  w[nP  +  P  —  l])r  is  a  white  Gaus¬ 
sian  vector  random  process,  u[n]  =  (ui[n],U2[n],  ...,UK[n})T 
where  u, [«]  is  the  symbol  sequence  of  user  i.  and  Hn)[ii] 
is  a  P  x  K  impulse  response  matrix  with  the  ith  column 
h^[n]  and  the  ( i ,  fc)th  entry  equal  to 

A, •!*[«]  =  hk[nP  +  i  -  1]  (12) 

in  which  hk\n\  is  the  signature  waveform  of  user  k  given  by 

p- 1 

hk[n]  =  ck[n ]  *  gk[n]  =  ^  ck[l]gk[n  -  l]  (13) 
1—0 

where  gk[n]  is  an  FIR  multipath  channel  of  order  qg  for  user 
k. 

MIMO  Model  II: 

Tsatsanis  and  Xu’s  minimum  variance  (MV)  equalizer 
[4]  estimates  Ui[n]  by 

wmv,;M  =  (14) 

where  the  superscript  lH'  denotes  complex  conjugate  trans¬ 
pose, 

x[n ]  =  (x[nP],  x[nP  +  1], ...,  x[nP  +  P  +  qg  -  1])T  (15) 
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VMv.i  =  R~'C,(Cf  R_1Ci)_1g,  (16) 

in  which  R  =  E\x[n]x11  [77]],  C,  is  a  ( P  +  qg)  x  (qg  + 
1)  matrix  constituted  by  c,-[n]  and  g,  is  an  estimate  of 
g,  =  (ff<[0],p#[l],...,9i[?9])T  obtained  as  the  eigenvector  of 
Cf  R-1C;  associated  with  the  smallest  eigenvalue.  Con¬ 
catenating  mmv.i'M,  i  —  1,2,  ...,K  yields 

X(2)[n]  =  (SmV.iH,  «MV,2[n],  WMV.A' H)7 

=  H(2)[n]  *  u[n]  +  w(2)[n]  (17) 

where  H(2)[n]  and  w(2)[n]  are  K  x  K  system  and  K  x  1 
spatially  correlated  and  temporally  colored  Gaussian  noise, 
respectively.  It  can  be  shown  that 

£u  =  E  \h?J  Nl2  »  Su  =  E  lfti  j[»]|2.  V  f  #  j  (18) 

n  n 

where  h^J[n]  is  the  («,j)th  entry  of  H(2)[n],  A  worthy  re¬ 
mark  about  the  above  two  MIMO  models  is  as  follows: 

(Rl)  Algorithm  1  can  be  employed  to  process  either  of 
x^[n]  and  x(2^[n]  for  obtaining  one  input  estimate 
Wj  [n]  where  j  is  unknown.  The  identification  of  user 
number  j  associated  with  x(  1 1  [??.]  has  been  reported 
in  [1],  while  that  associated  with  x(2'[«]  is  based  on 
(18)  as  follows: 

j  =  arg  max  }•  (19) 

1  <  i  <  7\  t  J 

where  is  obtained  by  (18)  with  /).|2)[n]  replaced  by 
the  ith  entry  /i,-2)[n]  of  the  channel  estimate  h‘2)[n] 
(see  (7)). 

4.  NEW  ALGORITHMS  FOR  BLIND 
EQUALIZATION  OF  DS/CDMA  SYSTEMS 

Assuming  that  the  SOI  is  «i[n],  Chi  and  Chen’s  algorithm 
[1],  a  multistage  successive  cancellation  algorithm,  obtains 
u\  [n]  by  processing  x(1)[n].  This  algorithm  can  be  extended 
by  processing  either  of  x(1)[n]  given  by  (11)  and  x(2)[ri] 
given  by  (17)  through  the  following  three  signal  processing 
steps  at  each  stage: 

Algorithm  2.  (with  l  =  1  or  Z  =  2) 

(VI)  Process  x(,)[n]  to  obtain  a  local  optimum  v  (and  v[n]) 
of  Jv,q{v)  using  Algorithm  1  and  the  associated  e[n] 
and  hj.°[n]  (see  (7)). 

(V2)  Update  x(,)[n]  by  x(,)[n]  —  hj^[n]  *  e[r?]  (see  (9))  (i.e., 
signal  cancellation). 

(V3)  Identify  the  user  number  j  as  presented  in  (Rl).  If 
j  =  1,  Mi[n]  =  e[n]  (i.e.,  ui[n]  has  been  obtained). 

Three  worthy  remarks  with  regard  to  Algorithm  2  are 
as  follows. 


(R2)  The  smaller  the  stage  number  k  at  which  uj  [rt]  is  ob¬ 
tained.  the  better  the  performance  of  Algorithm  2  due 
to  error  propagation  effects  resulting  from  imperfect 
cancellation  in  (V2).  However,  the  stage  number  k 
at  which  fii[n]  is  obtained  is  dependent  upon  the  ini¬ 
tial  condition  v{0\  which  can  be  chosen  as  the  least 
scjuare  solution  of  the  decorrelating  constraint  as  re¬ 
ported  in  [1]  for  /  =  1  and  as  the  one  associated  with 

v<0*[?)]  =  1km  •  <5[w  —  7lo],  Ll  <  77 o  <  Li2  (20) 

for  /  =  2  where  1km  is  a  K  x  1  unit  column  vector 
with  the  first  entry  equal  to  unity. 

(R3)  Algorithm  2  for  Z  =  1  is  exactly  the  same  as  Chi  and 
Chen’s  algorithm.  Algorithm  2  for  l  =  2  further  pro¬ 
cesses  the  MV  estimate  «MV,i[n]  (see  (14)  and  (17)), 
and  therefore,  its  performance  is  superior  to  that  of 
the  MV  equalizer. 

(R4)  By  our  experience,  the  performance  of  Algorithm  2  is 
better  for  l  =  1  than  for  l  =  2  for  low  near-far  ratio 
(NFR),  whereas  it  is  better  for  Z  =  2  than  for  Z  =  1 
for  high  NFR. 

Next,  let  us  present  a  hybrid  algorithm  using  Algorithm 
2  with  Z  =  1  and  /  =  2  based  on  (R4).  The  proposed  algo¬ 
rithm  obtains  the  SOI  estimate  Si  [77]  through  the  following 
two  steps. 

Algorithm  3. 

(Tl)  Set  k  =  1,  perform  all  the  steps  of  Algorithm  2  for 
Z  =  1  (identical  with  Chi  and  Chen’s  algorithm). 

(T2)  If  j  ^4  1  (i.e.,  u  1  [77]  has  not  been  obtained  in  (Tl)), 
then  perform  all  the  steps  of  Algorithm  2  for  Z  =  2  for 
all  the  ensuing  stages  k  >  2  until  Si  [77]  is  obtained. 

A  worthy  remark  regarding  Algorithm  3  is  as  follows. 

(R5)  Assume  that  Algorithm  3  obtains  Si [77]  at  the  koth 
stage.  The  obtained  Si  [77]  for  ko  =  1  (i-e.,  j  =  1  in 
(Tl))  (that  usually  happens  for  low  NFR.)  is  identical 
to  that  obtained  by  Chi  and  Chen’s  algorithm  for 
k  =  1,  while  that  for  ko  >  2  (i.e.,  j  ^  1  in  (Tl))  (that 
usually  happens  for  high  NFR)  is  identical  to  the  one 
obtained  at  the  (kn  -  l)th  stage  of  Algorithm  2  for 
Z  =  2.  Therefore,  Algorithm  3  performs  better  than 
Algorithm  2  regardless  of  Z  by  (R4). 

5.  SIMULATION  RESULTS 

An  asynchronous  DS/CDMA  channel  for  six  users  (K  =  6) 
taken  from  [1]  was  considered.  The  users’  spreading  codes 
Cj[n]  were  Gold  codes  of  length  P  =  31.  Input  signals 
Ui  [77] ,  i  =  1,2,  ...,K  were  assumed  to  be  equally  proba¬ 
ble  binary  random  sequences  of  {+1,  —1}  whose  amplitudes 
were  adjusted  such  that  £;  =  P[||h|1)  [77]  *  ■»/.,•  [ti] 1 1 2 ]  =  £,  i  = 
2,  3, ...,  6.  The  synthetic  data  xn ; [77]  for  N  =  2500,  different 
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values  of  NFR  {=  £/£i  =0, 10  dB)  and  different  values  of 
SNR  (=  £i/-E[||w(1)[n]||2]  =  3,5,7,9,11,13  dB)  were  pro¬ 
cessed  by  both  Algorithms  2  and  3  with  p  =  q  =  r  =  s  =  2, 
respectively,  and  with  the  length  of  the  causal  FIR  inverse 
filter  v[n]  equal  to  three. 

Figures  1(a)  and  1(b)  show  the  output  signal-to-interfer- 
ence-plus-noise  ratio  (SINR)  of  user  1  (the  weak  user)  for 
NFR  =  0  dB  and  NFR  =  10  dB,  respectively,  associ¬ 
ated  with  the  nonblind  linear  minimum  mean  square  error 
(LMMSE)  equalizer  (which  has  maximum  output  SINR) 
(solid  lines),  Algorithm  3  (  ‘A’),  Algorithm  2  for  /  =  1 
(‘O’)  (identical  with  Chi  and  Chen’s  algorithm),  Algorithm 
2  for  l  =  2  (‘X’)  and  the  MV  equalizer  (‘d’).  All  the  re¬ 
sults  of  Algorithm  3  were  obtained  at  the  stage  ko  =  1  for 
NFR  =  0  dB  and  k0  =  2  for  NFR  =  10  dB.  One  can 
observe,  from  these  figures,  that  Algorithm  2  for  l  =  2  per¬ 
forms  better  than  the  MV  equalizer  (see  (R3)),  and  that 
Algorithm  3  performs  as  well  as  Algorithm  2  with  l  =  1 
(identical  with  Chi  and  Chen’s  algorithm)  for  NFR  =  0  dB 
(low  NFR)  and  Algorithm  2  with  Z  =  2  for  NFR  =  10  dB 
(high  NFR),  respectively.  These  results  axe  consistent  with 
(R4)  and  (R5)  and  support  that  Algorithm  3  outperforms 
Chi  and  Chen’s  algorithm  and  the  MV  equalizer. 

6.  CONCLUSIONS 

We  have  proposed  two  blind  equalization  algorithms,  Al¬ 
gorithms  2  and  3,  for  multiuser  asynchronous  DS/CDMA 
systems  in  multipath.  Algorithm  2,  an  extension  of  Chi  and 
Chen’s  algorithm,  can  improve  the  performance  of  Tsatsa- 
nis  and  Xu’s  blind  MV  equalizer.  Algorithm  3  based  on 
Algorithm  2  outperforms  Chi  and  Chen’s  algorithm  and 
Tsatsanis  and  Xu’s  blind  MV  equalizer.  Some  simulation 
results  were  presented  to  support  the  efficacy  of  the  pro¬ 
posed  two  algorithms. 
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Fig.  1.  Simulation  results  for  a  6-user  case  with  respective 
powers  £\  and  £2  =  £3  =  £a  =  £5  =  £e  —  £,  including 
output  SINR  of  user  1  (the  weak  user)  associated  with  the 
LMMSE  equalizer  (solid  line),  Algorithm  3  (  ‘A’),  Algo¬ 
rithm  2  for  l  =  1  (‘O’)  (be.,  Chi  and  Chen’s  algorithm), 
Algorithm  2  for  l  =  2  (‘X’)  and  the  MV  equalizer  (*□’)  for 
(a)  NFR  =  0  dB  and  (b)  NFR  =  10  dB,  respectively. 
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1.  INTRODUCTION 

To  obtain  higher  capacity,  smart  antenna  techniques 
have  been  investigated  for  wireless  communications  . 
Most  smart  antenna  techniques  deal  with  signals  from 
receive  antenna  arrays,  but  similar  techniques  can  be 
used  for  transmitting  signals  using  a  transmit  antenna 
array.  In  fact,  the  use  of  a  transmit  antenna  array 
(TAA)  at  the  base  station  for  3rd  Generation  (3G) 
CDMA  systems  has  recently  attracted  a  lot  of  atten¬ 
tion  [3],  [6], [2].  Indeed,  even  if  in  theory  the  orthog¬ 
onal  property  of  Walsh  codes  allows  mitigating  multi 
user  interference  at  the  mobile  station,  the  multipath 
propagation  phenomenon)  introduces  a  non  negligible 
interference  level  within  the  same  cell,  thus  the  advan¬ 
tage  of  downlink  beamforming.  In  this  paper,  we  focus 
our  study  on  very  high  data  rates  3G  CDMA  applica¬ 
tions  where  spreading  gains  can  be  quite  low  (clown  to 
4).  In  such  a  situation,  the  performance  of  the  conven¬ 
tional  Rake  receiver  is  significantly  degraded.  This  is 
because  the  Rake  receiver  is  only  effective  in  situations 
where  multipath  effects  result  in  Inter  Chip  Interfer¬ 
ence  (ICI)  with  small  amount  of  Inter  Symbol  Interfer- 
ence(ISI).  However,  as  the  data  rates  increase  (and  the 
spreading  gains  decrease),  the  amount  of  ISI  becomes 
less  and  less  negligible  which  prevents  the  despreading 
operation  part  of  the  Rake  receiver  to  be  successful. 
Therefore,  in  such  a  situation,  downlink  beamforming 
can  be  very  useful.  Indeed,  it  can  reduce  the  amount 
of  multiuser  interference  at  the  receiver.  In  fact,  the 
beamformer  should  transmit,  signals  in  the  direction  of 
multipaths  less  affected  by  multi  user  interference. 

In  order  to  construct  downlink  beampattern,  one 
should  utilize  the  knowledge  of  the  downlink  channel 
as  well  as  the  bearings  of  the  downlink  signal.  How¬ 
ever,  in  the  frequency  division  duplexing  (FDD)  mode, 
since  the  downlink  and  uplink  channels  are  different, 
the  base  station  does  not  have  access  to  the  knowl¬ 
edge  of  the  downlink  channel.  This  requires  downlink 
channel  information  to  be  fed  back  from  the  mobile 


station  to  the  base  station.  Various  issues  associated 
with  channel  information  feedback  for  downlink  beam¬ 
forming  have  been  addressed  in  [6],  where  channel  es¬ 
timation  was  provided  by  the  Rake  receiver.  For  high 
data  rates,  we  have  seen  previously  that  channel  es¬ 
timation  cannot  rely  on  the  Rake  receiver.  Instead, 
it  has  to  be  provided  by  a  scheme  operating  at  the 
chip  level.  However,  if  the  multiuser  interference  on  the 
downlink  is  large,  such  a  scheme  may  provide  a  poor 
channel  est  imate  before  the  beamformer  is  correctly  set 
up.  In  turn,  one  cannot  expect  the  beamforming  oper¬ 
ation  to  be  fully  efficient  if  it  relies  on  a  poor  channel 
estimate.  This  illustrates  the  fact  that  in  this  case, 
channel  estimation  and  beamforming  strongly  depend 
on  each  other,  mainly  because  the  channel  seen  at  the 
mobile  station  is  in  fact  a  combination  of  the  propaga¬ 
tion  channel  and  the  beamformer.  Based  on  this  simple 
observation,  we  propose  in  this  paper  a  new  method 
where  channel  estimat  ion  and  beamforming  operations 
are  iterated  several  times  until  convergence  to  a  fixed 
solution. 

2.  PROBLEM  FORMULATION 

In  this  paper,  we  consider  the  transmission  of  a  CDMA 
signal  through  L  transmit  antennas.  We  consider  here 
the  baseband  channel  model  (at  the  chip  rate)  where 
the  channel  memory  M  is  greater  than  the  processing 
gain  (although  the  formulation  still  holds  in  the  general 
case),  as  it  is  likely  to  occur  for  high  data  rate  services. 
Let  us  define  the  notations  used  throughout  this  paper: 
-  hq[I]  is  the  L  x  1  channel  vector  at  symbol  time  /  for 
the  qtb  path  and  is  written  as:  h?[/]  =  (3q [/]  *  a(0q  [/]) 
where  a(0q  [/] )  is  the  Lx  1  known  steering  vector  asso¬ 
ciated  with  the  bearing  09[/]  of  the  qth  path  and  (3q[t\  is 
the  attenuation  factor  for  this  path.  Note  that  through¬ 
out  this  chapter,  we  will  assume  that  the  channel  char¬ 
acteristics  are  time  invariant.  Therefore,  the  time  index 
/  will  not  be  used  hereafter.  It  is  also  important  to  note 
that  with  this  model,  8q[l]  could  very  well  be  zero  (no 
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multipath  component) . 

-  H  is  the  L  x  M  channel  matrix  defined  as: 

H  =  [hr  li2  •  •  -hM] 

-  w  is  the  Lx  1  vector  of  coefficients  of  the  beamformer. 
-a:[?n]  is  the  sequence  of  chips  after  spreading. 

-  The  general  notation  Y'1  represents  the  estimate  of 
the  quantity  v  after  the  ith  iteration  of  the  algorithm. 

-  Throughout  this  paper,  the  operator  H  denotes  the 
Hermitian  operator. 

Using  the  above  notations,  the  signal  (at  the  chip 
rate)  received  at  the  desired  mobile  station  is  written 
as: 

M 

VV)  =  w/7^h?a:[/-9]  +  n[/]  (1) 

9  =  1 

where  n[/]  is  considered  as  a  Gaussian  noise  of  vari¬ 
ance  cr-  comprising  the  thermal  noise  effects  as  well 
as  the  multiuser  interference.  It  is  worth  pointing  out 
that  the  channel  viewed  at  the  mobile  station  is  in  fact 
the  combination  of  the  beamforming  coefficients  and 
the  propagation  channel.  Therefore,  at  the  mobile  sta¬ 
tion,  the  channel  estimation  will  provide  an  estimate 
of  Hffw  =  ?f(w). 

The  problem  to  be  solved  is  the  following: 
assuming  initial  knowledge  of  the  bearings,  iteratively 
find  the  optimal  set  of  coefficients  w  with  the  associated 
channel  estimates  such  that  the  multi-user  interference 
at  the  mobile  is  minimised. 

3.  THE  ITERATIVE  ALGORITHM 
The  proposed  method  involves  the  following  steps: 

1.  Initialisation: 

•  at  the  base  station,  w  is  initialised  to  w(0)  using 
the  knowledge  of  the  bearings. 

•  at  the  mobile  station,  7i{ w)  is  initialised  with 
a  random  value  7f(w)l0). 

2.  at  iteration  i  : 

•  at  the  mobile  station,  estimate  the  downlink 
channel  vector  7f(w(!)).  (see  section  4) 

•  feed  back  to  the  base  station. 

•  at  the  base  station,  compute  H(!+1 1  using  7f(w('i) 
and  wM 

•  given  H^+1q  compute  at  the  base  station  the 
corresponding  beamformer  coefficients  w(!+1).  (see 
section  5). 

3.  Termination 

When  ||H(!*  —  H^  +  1|||  <  e,  perform  Maximum 
Aposteriori  Probability  (MAP)  detection  of  non- 
coded  bits. 


4.  BEAMFORMING 

For  the  beamforming  technique,  we  have  selected  the 
algorithm  presented  in  [6]  which  maximises  the  Signal 
to  Noise  Ratio  experienced  at  the  mobile  station.  The 
beamformer  coefficients  are  simply  computed  by: 

w(f)  =  argrnaxw"H!!)H(,),,w  (2) 

W 

which  means  that  is  the  eigenvector  associated  to 
the  largest  eigenvalue  of  matrix  HHff . 

5.  CHANNEL  ESTIMATION  AND  DATA 
DETECTION 

We  have  selected  a  channel  estimation  scheme  based  on 
a  Hidden  Markov  Model  (HMM)  and  the  Expectation 
Maximisation  (EM)  algorithm,  as  previously  proposed 
for  CDMA  systems  in  [4].  This  choice  is  motivated  by 
the  fact  it  is  possible  to  include  a  measurement  of  the 
multi  user  interference  as  one  of  the  parameters  to  be 
estimated  (namely  the  variance  of  the  noise  n[l]  from 
equation  1.  This  is  particularly  useful  in  our  case  be¬ 
cause  it  can  be  shown  that  this  variance  estimate  drives 
the  convergence  of  the  EM  algorithm  in  such  a  way  that 
the  only  stable  solution  of  the  overall  iterative  scheme 
corresponds  to  a  small  value  of  the  variance  estimate. 
This  is  a  key  feature  for  this  scheme  since  a  small  level 
of  multi  user  interference  means  that  the  final  beam- 
former  has  nulls  in  the  directions  where  the  interference 
is  the  highest.  In  this  paper,  we  do  not  derive  in  full 
details  the  EM  algorithm  applied  for  channel  estima¬ 
tion  and  data  detection  which  can  be  found  in  [1].  We 
only  report  the  main  results  which  are  useful  for  the 
understanding  of  the  proposed  iterative  method. 

Consider  the  vector  Y [/]  containing  N  consecutive 
observations  y[/]  =  [j/[A/]  y[Nl—  1]  •  •  •  j/[Ar(/-l)  +  l]]T. 
At  chip  time  Nl  —  j,  let  us  write  the  modulated  chip 
x  [Nl  —  j]  as  a  function  of  a  spreading  code  chip  and  a 
non-coded  symbol.  Denoting  by  a  and  a'  the  integers 
such  a s  j  =  N a  +  a1  with  a'  <  N,  we  can  easily  check 
that 

x[Nl  -  j]  =  *[/  -  a]c[a']  (3) 

Therefore,  if  the  Inter  Chip  Interference  is  of  length  M , 
the  Inter  Symbol  Interference  will  be  of  length  m  +  1, 
with  m  the  integer  such  that,:  M  =  Nm+m'  with  m!  < 
N  Denoting  by  s[/]  the  vector  [  s[/]  s[/  —  1]  •  •  •  s[l  —  m]  ], 
one  can  rewrite  equation  1  as 

Y[l]  =  ;Fc(s[/])?f(w)  +  Nk  (4) 

where  the  function  iFc(s[/]),  is  only  used  for  the  purpose 
of  showing  that  the  observation  vector  Y[/]  depends  on 
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the  spreading  code  c  and  the  non-coded  sequence  of 
transmitted  symbols  [s[l]  s[l  —  1]  •  •  ■  s[l  —  ;«]]. 

It  is  easy  to  show  that  the  vector  s[/]  is  a  first  order 
Markov  process:  it  obeys  the  state  equation: 


s[/  +  1]  — 

■  0  o  • 

1  0  • 

0  ■ 

•  •  0 

s[/]  +  s[l  +  1] 

■  1  ■ 

0 

0  ■  •  •  • 

1 

.  0  . 

Therefore,  equations  4  and  5  respectively  correspond 
to  the  observation  and  state  equations  of  a  hidden  Markov 
model.  It  is  worth  emphasising  the  fact  that  the  hidden 
process  (and  therefore  the  detected  sequence)  is  the  se¬ 
quence  of  bit  before  spreading.  Consider  a  block  of  A 
vectors  of  observation  data  Y  =  (Y’[l],  Y’[2],  •  ■  ■,  Y  [A]). 
At  iteration  i  +  1  we  process  the  whole  block  of  data 
in  order  to  produce  the  estimate  of  the  combination  of 
the  beamformer  and  the  propagation  channel,  namely 
7f(w,!))  =  Based  on  the  HMM  formulation, 

it  can  be  shown  on  [1]  that  the  solution  provided  by 
the  block  EM  algorithm  is  given  by: 

Af(w(,))(,+1 1  =  R-1u  (6) 

where  matrix  R  and  vector  u  are  expressed  as 

K 

R  =  ^£{^.(s[/])w/-c(s[/])|>'.H">}  (7) 

1  =  1 
K 

u  =  ^A{J-c(s[/])wy(/]|>-,H,,)}  (8) 

1=1 

Note  that  the  expectation  £,{Jr<.(s[/])|T',  H('  ’}  is  com¬ 
puted  as: 

£{.fc(s[/])|T,h">}  =  Ejir  =  sSI>'.h,’’}:ac(0)w 

Here  is  one  among  the  2’"+1  possible  realisations  of 
the  stochastic  process  s[/].  The  a-posteriori  probability 
Pr{s[l]  =  £j|Y\H(,)}  is  calculated  by  multiplying  the 
so-called  forward  and  backward  variables  of  an  HMM 
(see  [1]).  Note  that  in  order  to  calculate  these  probabil¬ 
ities,  we  assume  that  the  noise  process  Ar [/]  which  cor¬ 
responds  to  the  multiuser  interference  and  the  thermal 
noise,  is  a  white  Gaussian  process  of  unknown  variance 
<t~  which  depends  on  the  performance  of  the  beam- 
former.  The  EM  algorithm  for  estimating  this  param¬ 
eter  leads  to: 


„2(i+*) 


7-c(s)H(w"-1,)(',||2|>’,H(')} 

(9) 


When  the  beamformer  is  not  correctly  set  up,  the 
variance  estimate  will  be  quite  large.  The  a-posteriori 
probabilities  will  then  be  calculated  with  "flat”  Gaus¬ 
sian  functions  and  one  can  show  that  all  Pr{s[/]  = 
£j|y,H|!)}  will  have  the  same  value.  As  a  result,  one 


can  show  that  the  channel  estimates  will  converge  to 
values  close  to  zero.  Therefore,  the  beamformer  will 
not  be  able  to  use  any  particular  information  and  will 
steer  a  beam  in  a  random  direction.  In  fact,  as  long 
as  the  variance  estimate  is  large,  the  beamformer  will 
go  on  steering  beams  in  all  possible  directions  until  it 
finds  one  that  corresponds  to  a  lower  multi  user  in¬ 
terference.  In  this  case,  the  variance  estimate  will  be 
small,  resulting  in  an  accurate  channel  estimate  which 
will  allow  the  beamformer  to  refine  its  configuration 
while  keeping  the  same  general  steering.  Indeed,  the 
channel  taps  which  correspond  to  the  nulls  will  be  esti¬ 
mated  to  zero  at  the  mobile  station,  which  means  that 
the  beamformer  will  further  disregard  these  channel 
taps  and  will  not  transmit  in  their  direction  any  longer. 
Therefore,  the  iterative  algorithm  will  have  converged. 

6.  SIMULATIONS 

In  this  section,  we  highlight  the  potential  benefits  of  the 
proposed  method.  The  desired  user  is  characterised  by 
a  spreading  gain  of  5  and  the  number  of  multipaths 
between  the  BS  and  the  mobile  station  has  been  set,  to 
9.  We  consider  the  transmission  of  the  signal  through 
an  array  of  10  antenna  elements.  In  addition  to  the  de¬ 
sired  user,  the  system  is  supporting  10  additional  users. 
Therefore,  there  will  be  a  non-neglcctible  mult  iuser  in¬ 
terference  at  the  desired  mobile  station. 

In  figure  1.  the  upper  graph  shows  the  angular  power 
spectra  respectively  for  the  interfering  users  (dashed 
curve)  and  the  desired  user  (plain  curve)  as  a  function 
of  their  angle  of  arrival  at  the  Base  Station.  Note  that 
some  multipath  components  for  the  desired  user  share 
common  bearings  with  some  interfering  signals.  There¬ 
fore,  it  is  expected  that  the  channel  estimation  opera¬ 
tion  at  the  first  iteration  will  suffer  from  multiuser  in¬ 
terference.  This  is  confirmed  by  the  graph  which  shows 
the  beampattern  of  the  beamformer  after  the  first  itera¬ 
tion  of  the  algorithm.  One  can  see  that  the  beamformer 
is  clearly  steering  a  beam  towards  directions  where  the 
desired  user’  signal  is  weak  compared  to  the  interfer¬ 
ence  level.  During  iterations  2,  3  and  4,  one  can  see 
that  the  beamformer  is  steering  beams  towards  direc¬ 
tions  where  the  multi  user  interference  is  quite  high.  It 
is  interesting  to  observe  that  at  iteration  2,  the  beam- 
former  is  steering  a  beam  in  a  direction  where  both 
the  desired  user  and  interference  signal  are  weak.  Al¬ 
though  the  resulting  estimated  multi  user  interference 
variance  is  low  (see  on  figure  2),  the  solution  is  not 
acceptable  for  the  iterative  algorithm:  this  is  because 
at  this  stage,  the  beamformer  will  attempt  to  steer  a 
beam  in  a  direction  where  the  SNR  is  maximised  (see 
beampattern  at  iteration  3).  However,  the  direction 
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where  the  SNR  is  maximised,  is  also  a  direction  where 
the  multi  user  interference  is  high.  Therefore,  this  set 
up  is  unstable  as  at  the  next  iteration,  the  interference 
variance  will  be  high  again,  resulting  in  a  poor  channel 
estimate.  The  beamformer  then  keeps  on  scanning.  Fi¬ 
nally,  at  iteration  4,  the  beamformer  finds  a  direction 
where  the  multi  user  interference  is  low  and  the  SNR 
is  maximised  as  well.  After  the  beampattern  is  refined 
in  this  general  direction  at  iteration  5,  the  algorithm 
reaches  the  convergence  stage. 

The  graph  on  figure  2  plots  the  estimates  of  the  fad¬ 
ing  coefficients  and  the  interference  variance,  the  latter 
playing  a  key  role  in  the  convergence  of  the  iterative 
scheme. 


Figure  2:  Fading  coefficient  and  noise+interference 
variance  estimates 
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Figure  1:  power  spectrum  of  desired  and  interference 
signals  and  beampatterns  during  iterations 


7.  CONCLUSION 

In  this  paper,  we  have  presented  a.  method  for  efficiently 
combining  channel  estimation  and  downlink  beamform¬ 
ing  for  CDMA  systems,  in  cases  where  the  Rake  re¬ 
ceiver  cannot  be  used  for  channel  estimation  purposes. 
This  method  relies  on  an  iterative  scheme  which  it¬ 
erates  between  a  channel  estimation  scheme  which  is 
only  stable  when  the  multi  user  interference  is  low  and 
a  beamforming  operation  which  maximises  the  received 
Signal  to  Noise  Ratio.  Simulation  results  presented  in 
this  paper  show  that  this  iterative  scheme  seems  to  con¬ 
verge  to  solutions  which  maximise  the  Signal  to  Noise 


plus  Interference  Ratio  (SINR)  which  is  an  attractive 
feature  since  it  is  achieved  without  taking  into  account 
other  user’s  statistics.  Additional  work  is  currently  un¬ 
dertaken  to  analyse  the  convergence  properties  of  this 
iterative  scheme. 
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ABSTRACT 

Spatially  distributed  network  of  radar  sensors  are  being  used  for 
target  tracking  and  for  generating  Single  Integrated  Aerial 
Picture  (SIAP).  In  such  a  network  generally  each  sensor  sends 
whatever  target  track/association  information  it  has  to  every  other 
sensor.  This  has  the  disadvantage  of  requiring  more 
communication  bandwidth  and  processing  power.  One  of  the 
ways  to  reduce  the  communication  bandwidth  and  the  processing 
power  is  to  discover  features  that  would  improve  the  target 
detection/track  accuracy  and  activate  those  sensors  that  would 
provide  the  missing  information  and.  form  clusters  of  sensors 
that  have  consistent  information.  In  this  paper,  we  describe  a 
minimax  entropy  based  technique  for  feature  discovery  and 
within  class  entropy  based  technique  for  feature/sensor 
discrimination.  After  discovering  the  features,  those  sensors  that 
can  provide  the  discovered  features  are  activated.  The  decision 
based  on  the  sensor  discrimination  is  used  in  cluster  formation. 
The  experimental  details  and  simulation  results  that  are  provided 
here  indicate  that  these  metrics  are  efficient  in  discovering 
features  and  in  discriminating  sensors.  The  techniques  described 
in  this  paper  are  dynamic  in  nature  -  as  it  acquires  information  it 
is  making  a  decision  on  whether  it  is  from  a  good  sensor  in  terms 
of  consistency.  This  has  the  advantage  of  discarding  non-valid 
information  dynamically  and  making  progressive  decision. 

1.  INTRODUCTION 

Spatially  distributed  network  of  radar  sensors  are  being  used  for 
target  tracking  and  for  generating  Single  Integrated  Aerial 
Picture  (SIAP).  In  such  a  network  generally  every  sensor  node 
has  the  same  information.  This  is  achieved  by  each  sensor 
sending  whatever  target  track/association  information  it  has  to 
every  other  sensor.  This  has  the  disadvantage  of  requiring  more 
communication  bandwidth  and  processing  power.  One  of  the 
ways  to  reduce  the  communication  bandwidth  and  the  processing 
power  is  to  discover  features  that  would  improve  the  target 
detection/track  accuracy  and  activate  those  sensors  that  would 
provide  the  missing  information  and.  form  clusters  of  sensors 
that  have  consistent  information.  The  sensors  that  are  part  of  a 
cluster  will  communicate  using  a  high  bandwidth  by  sending 
information  that  each  sensor  has  to  the  other  members  of  the 
cluster  and  the  clusters  themselves  communicate  with  each  other 
using  low  bandwidth  communication  network  by  transmitting 
only  the  fused  information  about  a  target  track  to  other  clusters. 
In  this  paper,  we  describe  a  minimax  entropy  based  technique  for 
feature  discovery  and  within  class  entropy  based  technique  for 
feature/sensor  discrimination.  After  discovering  the  features, 
those  sensors  that  can  provide  the  discovered  features  are 
activated.  The  decision  based  on  the  sensor  discrimination  is 
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used  in  cluster  formation  and  mutual  information  metric  is  used 
in  information  fusion  to  improve  the  track  accuracy.  To  the  best 
knowledge  of  the  author  of  this  paper  there  is  no  study  on  sensor 
discrimination  using  within  class  entropy  metric  is  reported  even 
though,  there  is  one  study  on  using  mutual  information  for 
selecting  a  subset  of  features  from  a  bigger  set  that  is  described  in 
[1).  The  technique  described  in  this  paper  uses  within  class 
entropy  as  a  metric  to  discriminate  good  sensor  vs.  bad  sensor. 
Unlike  our  technique,  the  technique  in  1 1  ]  is  static  in  nature  and 
cannot  handle  the  case  where  the  dimensionality  of  the  feature 
set  varies.  In  |2],  the  author  shows  that  in  general  by  fusing  data 
from  selective  sensors  the  performance  of  a  network  of  sensors 
can  be  improved.  However,  in  this  study,  no  specific  novel 
metrics  for  the  feature  discovery  and  feature/sensor 
discrimination  were  developed  unlike  in  this  paper.  In  13], 
techniques  to  represent  Kalman  filter  state  estimates  in  the  form 
of  information  -  Fisher  and  Shannon  entropy  are  provided.  In 
such  a  representation  it  is  straightforward  to  separate  out  what  is 
new  information  from  what  is  either  prior  knowledge  or  common 
information.  This  separation  procedure  is  used  in  decentralized 
data  fusion  algorithms  that  are  described  in  |3).  However,  to  the 
best  knowledge  of  this  author  no  study  has  been  reported  on 
using  miniinax  entropy  principle  for  the  feature  discovery.  In 
addition,  the  significance  of  this  study  is  the  possible  application 
of  feature  discovery  and  sensor  discrimination  in  the  formation  of 
a  cluster  of  distributed  sensors  including  radar  sensors  to 
improve  the  decision  accuracy  such  as  target  tracking  accuracy  in 
the  case  of  network  of  radars  and  reducing  the  communication 
bandwidth  requirements.  In  the  next  section,  proposed  feature 
discovery  and  sensor  discrimination  techniques  are  described. 
The  simulation  description  and  experimental  results  are  provided 
in  section  3.  Conclusions  and  future  research  directions  are 
provided  in  section  4. 

2.  A  BRIEF  DESCRIPTION  OF  THE 
PROPOSED  TECHNIQUES 

2.1  Discovery  of  missing  information: 

In  the  case  of  (a)  target  detection,  identification  and  tracking,  (b) 
classification,  (c)  coalition  formation,  etc.,  applications,  the 
missing  information  could  correspond  to  feature  discovery.  This 
helps  in  only  probing  (awakening)  the  sensor  node  that  can 
provide  the  missing  information  and  thus  save  power  and 
processing  by  not  arbitrarily  activating  nodes.  We  apply  the 
minimax  entropy  principle  described  in  [4]  for  the  feature 
discovery.  The  details  of  estimation  of  missing  information  in 
other  words  feature  discovery  using  the  minimax  entropy 
principle  are  as  follows. 
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2.1.1  Minimax  entropy  principle: 

Let  N  given  values  corresponds  to  n  different  information  types. 
Let  z y  be  the  /'  member  of  fh  information  type  (where  the 
information  type  is  defined  as  a  cluster  of  values  that  give  similar 
information  measures)  so  that 

n 

j  =  =  1,2 . n\  Ymi  =  N .  Eq.  (1) 

i= 1 

Then  the  entropy  for  this  type  of  classes  of  information  is: 

n  mi  Zjj  Zjj  n  mi 

I  —  In  —  where  T  =  X  X  zj; .  Eq.  (2) 
MJ= 1  T  T  Mj= 1  1J 
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Eq.  (3) 


the  entropy  of  values  that  belong  to  information  type  /. 


In  the  equation  above,  //„  &  HB  are  entropy  of  within  classes 
(information  types)  and  between  classes,  respectively.  We  would 
like  types  of  information  to  be  as  distinguishable  as  possible  and 
we  would  like  the  information  within  each  type  to  be  as 
homogenous  as  possible.  The  entropy  is  high  if  the  values 
belonging  to  a  type  (class)  represent  similar  information  and  is 
low  if  they  represent  dissimilar  information.  Therefore,  we  would 
like  Hb  to  be  as  small  as  possible  and  H„,  as  large  as  possible. 
This  is  the  principle  of  minimax  entropy. 


2.1.2  Application  of  minimax  entropy  principle  for 
feature  discovery: 

Let  z  be  the  missing  value  (feature).  Let  T  be  the  total  of  all 
known  values  such  that  the  total  of  all  values  is  T+  z.  Let  T,  be 
the  total  of  values  that  belong  to  information  type  to  which  zmay 
belong.  T,+z  then  is  the  total  of  that  particular  type  of 
information.  This  leads  to: 
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Eq.  (4) 


T+z  T+z  T+z  T+z 


Here,  2  denotes  the  summation  over  all  values  of  i,  j  except  that 
correspond  to  the  missing  information  &  2  denotes  the  summation 
over  all  values  of  i  except  for  the  type  to  which  the  missing 
information  belongs,  respectively. 


We  can  then  estimate  z  by  minimizing 
HB/HworHB/(H-HB)orHB/H  or  by  maximizing 

(H-HB)/HBorH/HB.  The  estimates  of  z  provide  the 

missing  information  values  (features)  and  information  (feature) 
type.  From  the  above  discussion  we  can  see  that  we  will  be  able 
to  discover  features  as  well  as  type  of  sensor  from  which  these 
features  can  be  obtained.  This  has  the  advantage  of  probing  the 
appropriate  sensor  in  a  distributed  network  of  sensors.  The 
transfer  of  information  and  probing  can  be  achieved  in  such  a 
network  by  using  network  routing  techniques.  Before  trying  to 
use  the  newly  acquired  feature  set,  it  is  advisable  to  check  the 
relevance  of  the  feature  set  in  terms  of  consistency/improving  the 
accuracy  to  reduce  the  cost  of  processing.  In  a  distributed 
network  of  sensors  this  has  an  added  advantage  of  reducing  the 
communication  cost.  We  measure  the  relevance  in  other  words 
discriminate  feature  set  from  a  good  sensor  vs.  bad  sensor  by 
using  the  within  class  entropy  that  is  described  below. 


2.2  Measure  of  consistency: 


We  measure  relevance  by  measuring  consistency.  For  this  we 
have  developed  a  metric  based  on  within  class  entropy  that  is 
described  in  this  section.  Let  there  are  N  events  (values)  that  can 
be  classified  in  to  m  classes  and  let  an  event  Xjj  be  the/'  member 
of  fh  class  where  i  =  1,2 . m,  j  =  1,2 . ^  and 
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=  Hw  +  where  77  w  is  called  the  entropy  within  classes 


and  Hjj  is  called  the  entropy 
between  classes. 
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The  entropy  Hw  is  high  if  the  values  or  events  belonging  to  a 
class  represent  similar  information  and  is  low  if  they  represent 
dissimilar  information.  This  means  //„  can  be  used  as  a  measure 
to  define  consistency.  That  is.  if  two  or  more  sensor 
measurements  are  similar  then  their  Hw  is  greater  than  if  they  are 
dissimilar.  Therefore,  this  measure  can  be  used  in  sensor 
discrimination  or  selection.  Note  that  even  though  the  definitions 
of  within  class  and  between  class  entropy  here  are  slightly 
different  from  section  2.1,  they  are  similar  in  concept.  Note  also 
that  the  minimax  entropy  measure  that  uses  both  within  and 
between  class  entropies  was  used  earlier  in  the  estimation  of 
missing  information;  but  here,  within  class  entropy  is  defined  as 
a  consistency  measure  that  can  be  used  in  sensor  discrimination 
or  selection.  These  two  metrics  have  different  physical 
interpretations  and  are  used  for  different  purposes. 

3  SIMULATIONS 

Above  described  feature  discovery  and  sensor  discrimination 
algorithm  has  been  applied  for  the  feature  discovery,  sensor 
discrimination  and  cluster  formation  in  a  network  of  radar 
sensors.  Note  that  this  simulation  is  simplified  since  the  goal  is  to 
prove  the  concepts. 

This  network  of  sensors  is  used  for  tracking  multiple  targets. 
Each  sensor  node  has  a  local  and  global  Kalman  filter  based 
target  trackers.  These  target  trackers  estimate  the  target  states  - 
position  and  velocity  in  Cartesian  co-ordinate  system.  The  local 
tracker  uses  the  local  radar  sensor  measurements  to  estimate  the 
state  estimates  while  the  global  tracker  fuses  target  states 
obtained  from  other  sensors  if  it  is  consistent  and  improves  the 
accuracy  of  the  target  tracks. 

For  the  purposes  of  testing  the  feature/sensor  discrimination 
algorithm,  a  network  of  three  radar  sensors  and  a  single  moving 
target  with  constant  velocity  were  considered.  Two  sensors  were 
unbiased  and  thus  considered  as  good  and  one  was  biased  and 
thus  considered  as  bad.  The  bias  was  introduced  as  the  addition 
of  a  random  number  to  the  true  position  of  a  target.  The  bias  was 
introduced  this  way  because  the  biases  in  azimuth  and  range 
associated  with  a  radar  sensor  translate  into  measured  target 
position  that  is  different  from  the  true  target  position.  In  addition, 
currently  in  our  simulations,  we  are  assuming  that  the  sensors  are 
measuring  the  target’s  position  in  the  Cartesian  co-ordinate 
system  instead  of  the  polar  co-ordinate  system.  The  amount  of 
bias  was  varied  by  multiplying  the  random  number  by  a  constant 
k  i.e.,  measured  position  =  (true  position  +  k  *  randn)  + 
measurement  noise. 

The  measurements  from  a  radar  at  each  sensor  node  was  used  to 
estimate  the  target  states  using  the  local  Kalman  filter  algorithm. 
The  estimated  target  states  at  each  sensor  node  were  transmitted 
to  other  nodes.  For  this  simulation,  only  estimated  position  was 
considered  for  simplicity. 

We  consider  the  estimated  state  vector  as  the  feature  set  here. 
Since  the  goal  of  this  simulation  is  proof  of  concept,  the 
feature/sensor  discrimination  algorithm  was  implemented  at 
sensor  node  1  with  the  assumption  it  is  a  good  sensor.  Let  the 
state  estimate  outputs  of  this  node  be  Ag.  Let  the  state  estimate 


outputs  of  a  second  sensor  correspond  to  BR  and  a  third  sensor 
correspond  to  Bt). 

For  the  computation  of  entropy  probability  values  are  needed  as 
seen  from  the  equation  above.  To  obtain  these  values,  ideally, 
one  would  need  probability  distribution  functions  (pdfs). 
However,  in  practice  it  is  hard  to  obtain  closed  form  pdfs.  In  the 
absence  of  knowledge  of  actual  pdfs  it  is  a  general  practice  to 
estimate  them  by  using  histograms  [5],  Researchers  in  signal  and 
image  processing  use  this  technique  most  commonly  [6]. 
Therefore,  we  use  the  histogram  approach  here.  In  order  to  obtain 
the  histograms,  initially,  we  need  some  data  (features)  to  know 
how  it  is  distributed.  For  this  purpose,  it  was  assumed  that 
initially  TV  state  estimate  vectors  were  accumulated  at  each  sensor 
node  and  this  accumulated  vector  was  transmitted  to  other  nodes. 
Note  also  that  the  accuracy  of  probability  estimates  using  the 
histogram  approach  depends  on  the  amount  of  accumulated 
(training)  data.  Also  for  non-stationary  features,  it  depends  on 
how  often  the  histograms  are  updated.  In  practice,  since  the 
training  data  is  limited  we  have  set  AT  to  10  in  this  simulation.  To 
take  care  of  the  non-stationarity  of  the  features,  initially,  we  wait 
till  AT  estimates  are  obtained  at  each  node.  From  then  on  we 
update  the  histograms  every  time  instant  using  the  new  state 
estimate  and  previous  nine  state  estimates.  At  each  time  instant 
we  discard  the  oldest  feature  (oldest  state  estimate). 

To  get  the  probability  of  occurrence  of  each  feature  vector,  first 
the  histogram  was  computed.  For  this,  bin  size  ATW;,  of  5  was 
used.  The  center  point  of  each  bin  was  chosen  based  on  the 
minimum  and  maximum  feature  values.  In  this  simulation  the  bin 
centers  were  set  as: 

min(feature  values) 

/  \  maxffeature  values)  -  min(feature  values) 


Since  the  histogram  provides  the  number  of  elements  in  a  given 
bin.  it  is  possible  to  compute  the  probabilities  from  the 
histogram.  In  particular  it  is  computed  as: 

#  elements  in  a  particular  bin 
total  number  of  elements 

First,  the  minimax  entropy  principle  was  applied  to  find  the 
missing  information,  the  appropriate  sensor  was  probed  to  obtain 
that  information,  then  the  consistency  measure  -  within  class 
entropy  was  applied  to  check  whether  the  new  information 
obtained  from  that  particular  sensor  is  consistent  with  the  other 
sensors. 

In  the  following  two  figures,  within  class  entropy  is  plotted  for 
feature  discovered  from  two  unbiased  sensors  and,  one  biased 
and  one  unbiased  sensor.  The  measurement  noise  level  was  kept 
the  same  for  all  three  sensors.  However,  k  was  set  to  1.0  in 
Figure  1  and  was  set  to  2  in  Figure  2.  The  within  class  entropy 
was  computed  for  different  iterations  using  the  definition 
provided  in  the  previous  section.  The  probability  values  needed 
in  this  computation  were  estimated  using  the  histogram  approach 
as  described  before.  From  these  two  figures,  it  can  be  seen  that 
the  within  class  entropy  of  two  unbiased  sensors  is  greater  than 
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the  within  class  entropy  of  one  biased  and  one  unbiased  sensors. 
This  indicates  that  the  within  class  entropy  can  be  used  as  a 
consistency  measure  to  discriminate  between  sensors  or  to  select 
sensors. 


Figure  1:  Plot  of  within  class  entropy  of  sensors  1  &  2  (unbiased 
sensors)  and,  1  (unbiased)  and  3  (biased).  Bias  constant  k  =  1 


Figure  2:  Plot  of  within  class  entropy  of  sensors  1  &  2  (unbiased 
sensors)  and,  1  (unbiased)  and  3  (biased).  Bias  constant  k  =  2 


requirement.  The  application  of  techniques  in  not  restricted  to 
network  of  radar  sensors  but  can  be  applied  to  any  other  type  of 
sensors.  Hence,  these  techniques  can  be  used  in  a  distributed 
network  of  any  type  of  sensors  for  hierarchical  processing,  for 
cluster  formation  and  for  data  fusion.  Future  work  warrant 
implementation  of  these  techniques  on  distributed  sensor 
hardware  nodes  with  multiple  sensors  and  test  the  capabilities  of 
these  algorithms  in  hierarchical  processing  and  cluster  formation 
in  field. 
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We  then  form  a  cluster  of  sensors  that  are  consistent  and  apply 
the  mutual  information  metric  that  we  have  developed  in  [7]  to 
verify  whether  the  mutual  information  increases.  In  [7],  we  have 
shown  that  by  fusing  information  from  sensors  when  the  mutual 
information  increases,  the  decision  accuracy  improves.  We 
transmit  the  fused  decision  (which  requires  much  lower 
bandwidth  compared  to  the  transmission  of  decision  of  each 
sensor  to  every  other  in  the  network)  to  other  clusters  of  sensors 
and  thus  reduce  the  communication  bandwidth  requirement. 

4  CONCLUSIONS 

In  this  paper,  we  have  described  how  minimax  entropy  principle 
can  be  used  in  feature  (missing  information)  discovery.  Further, 
in  this  paper  a  consistency  measure  is  defined  and  it  has  been 
shown  that  this  measure  can  be  used  in  discriminating  sensors. 
We  also  have  shown  how  these  two  measures  are  used  in  the 
cluster  formation  and  reduction  of  communication  bandwidth 
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ABSTRACT 

Entropy-based  divergence  measures  have  shown  promising  results 
in  many  areas  of  engineering  and  image  processing.  In  this  paper, 
a  generalized  information-theoretic  measure  called  Jensen-Renyi 
divergence  is  proposed.  Some  properties  such  as  convexity  and  its 
upper  bound  are  derived.  Using  the  Jensen-Renyi  divergence,  we 
propose  a  new  approach  to  the  problem  of  ISAR  (Inverse  Synthetic 
Aperture  Radar)  image  registration.  The  goal  is  to  estimate  the  tar¬ 
get  motion  during  the  imaging  time.  Our  approach  applies  Jensen- 
Renyi  divergence  to  measure  the  statistical  dependence  between 
consecutive  ISAR  image  frames,  which  would  be  maximal  if  the 
images  are  geometrically  aligned.  Simulation  results  demonstrate 
a  much  improved  performance  of  the  proposed  method  in  image 
registration. 

1.  INTRODUCTION 

Image  registration  is  an  important  problem  in  computer  vision,  re¬ 
mote  sensing,  data  processing  and  medical  image  analysis.  The 
objective  of  image  registration  is  to  find  a  spatial  transformation 
such  that  a  dissimilarity  metric  achieves  its  minimum  between  two 
or  more  images  taken  at  different  times,  from  different  sensors,  or 
from  different  viewpoints. 

Inverse  Synthetic  Aperture  Radar  (ISAR)  [1]  is  a  microwave 
imaging  system  capable  of  producing  high  resolution  imagery  from 
data  collected  by  a  relatively  small  antenna.  The  ISAR  imagery  is 
induced  by  target  motion,  however,  motion  also  blurs  the  resulting 
image.  After  conventional  ISAR  translational  focusing  process, 
image  registration  can  be  applied  to  estimate  the  target  rotational 
motion  parameter,  then  polar  re-formating  can  be  used  to  achieve 
a  higher  resolution  image. 

During  the  last  three  decades,  a  wide  range  of  registration 
techniques  have  been  developed  for  various  applications.  These 
techniques  can  be  classified  [2]  into  correlation  methods.  Fourier 
methods,  landmark  mapping,  and  elastic  model-based  matching. 

In  the  work  of  Woods  [3]  and  Viola  [4],  mutual  information,  a 
basic  concept  from  information  theory,  is  introduced  as  a  measure 
for  evaluating  the  similarity  between  images.  When  the  two  im¬ 
ages  are  properly  matched,  corresponding  areas  overlap,  and  the 
resulting  joint  histogram  contains  high  values  for  the  pixel  combi¬ 
nations  of  the  corresponding  regions.  When  the  images  are  mis- 
registered,  non-corresponding  areas  also  overlap  and  this  will  re¬ 
sult  in  additional  pixel  combinations  in  the  joint  histogram.  In 
case  of  misregistration,  the  joint  histogram  has  less  sharp  peaks 

This  work  was  supported  by  an  AFOSR  grant  F49620-98-1-0190  and 
by  ONR-MURI  grant  JHU-72798-S2  and  by  NCSU  School  of  Engineer¬ 
ing. 


and  is  more  dispersed  than  the  correct  alignment  of  the  images. 
The  registration  criterion  is  then  to  find  the  transformation  such 
that  the  mutual  information  of  the  corresponding  pixel  pair  inten¬ 
sity  values  in  the  matching  images  is  maximized.  This  approach 
is  accepted  by  many  [5]  as  one  of  the  most  accurate  and  robust 
registration  measures. 

In  this  paper,  a  novel  generalized  information  theoretic  mea¬ 
sure.  called  Jensen-Renyi  divergence  and  defined  in  terms  of  Renyi 
entropy  [6]  is  introduced.  Jensen-Renyi  divergence  is  defined  as 
the  similarity  measurement  of  any  finite  number  of  weighted  prob¬ 
ability  distributions.  Shannon  mutual  information  is  a  special  case 
of  the  Jensen-Renyi  divergence.  This  generalization  endows  us 
the  ability  to  control  the  measurement  sensitivity  of  the  joint  his¬ 
togram,  which  would  end  up  with  a  better  registration  result. 

In  the  section  that  follows,  we  give  a  brief  statement  of  the 
problem.  In  section  3.  we  introduce  the  Jensen-Renyi  divergence 
and  its  properties.  Section  4  is  devoted  to  the  application  of  the 
Jensen-Renyi  divergence  in  ISAR  image  registration.  Finally,  we 
provide  some  concluding  remarks  in  the  section  5. 

2.  PROBLEM  STATEMENT 

ISAR  imagery  registration  can  be  applied  to  estimate  the  target 
motion  during  the  imaging  time.  Let  T((,e, 7)  be  a  Euclidean  trans¬ 
formation  with  translational  parameter  l  =  (lx,ly),  rotational  pa¬ 
rameter  9  and  scaling  parameter  7.  Given  two  ISAR  images  /  and 
r,  the  objective  of  image  registration  is  to  determine  the  spatial 
transformation  parameters  (P ,  6“ ,  7* )  such  that 

(r,0*,7*)  =  arg  max  JR^(Pi(f,  Ty,e (1) 

(1,0, -r) 

P2(/> Ty.e^r)) 

where  JR^{  )  is  the  Jensen-Renyi  divergence  with  order  a  and 
weight  u>.  Denote  X  =  {n,  *2,  •  •  •  ,*«}  and  Y  —  {y\ ,  3/2, . . . ,  yd) 
the  sets  of  pixel  intensity  values  of  f  and  T(i,e,~,)r  respectively, 
then  u>i  =  P(X  =  Xi)  and  p;(/,  T(i,e,7)r)  =  (pij)i <;'<*•■  ptj  - 
P{Y  =  Vj\X  =  Xi),  i  =  1,2 j  =  1,2,...,*  is  the 
conditional  probability  of  Y  ~  yj  given  X  —  x,  for  the  corre¬ 
sponding  pixel  pairs  in  /  and  T(ito,-t)r-  Here  the  Jensen-Renyi  di¬ 
vergence  acts  as  a  similarity  measure  between  images,  which  will 
be  explained  further  in  the  next  section. 

ISAR  imagery  is  induced  by  target  motion,  however,  the  target 
motion  causes  time-varying  spectra  of  the  received  signals.  Mo¬ 
tion  compensation  has  to  be  carried  out  to  obtain  a  high  resolution 
image.  As  the  radar  keeps  tracking  the  target,  the  reflected  signal 
is  continuously  recorded  during  the  imaging  time.  By  registration 
of  a  sequence  of  consecutive  image  frames.  {/;}-l0,  the  target 
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motion  during  the  imaging  time  can  be  estimated  by  interpolat¬ 
ing  {(Zi,0i,7i)}£li.  Then  based  on  the  trajectory  of  the  target, 
translational  motion  compensation  (TMC),  and  rotational  motion 
compensation  (RMC)  [1]  can  be  used  to  generate  a  clear  image  of 
the  target. 

3.  THE  JENSEN-RENYI  DIVERGENCE 

Let  k  G  N  and  X  =  (xi ,  x2, . . . ,  x&}  be  a  finite  set  with  a  prob¬ 
ability  distribution  p  =  (pi,p2,  ■  ■  ■ ,Pk ),  i.e.  p3  =  1  and 

Pj  =  P(X  —  Xj  )  >  0,  where  P(  )  denotes  the  probability. 

Renyi  entropy  is  a  generalization  of  Shannon  entropy,  and  is 
defined  as  [6] 

1  k ' 

Ra(p)  =  YZ —  l°g 1  Q  >  0  and  a  1.  (2) 

j=i 

For  a  >  1,  the  Renyi  entropy  is  neither  concave  nor  convex. 

For  a  G  (0, 1),  it  is  easy  to  see  that  Renyi  entropy  is  concave, 
and  tends  to  Shannon  entropy  H(p)  as  a  -»  1.  It  can  be  easily 
verified  that  Ra  is  a  non-increasing  function  of  a,  and  hence 

Ra(p)>H(p),  Vq  G  (0, 1).  (3) 

Definition  1  Let  pi,p2,  ■  ■ .  ,pn  be  n  probability  distributions  of 
X  and  u  —  (uti ,  u>2 , . . . ,  wn )  be  a  weight  vector  such  that 
SiLi  =  1  and  Wj  >  0.  The  Jensen-Renyi  divergence  is  defined 
as 

JRa  (pj ,  .  .  .  ,  pn)  =  Ra  |  y  '  (jJtpi  |  —  ^  '  UiR-a  (Pj), 

\i=i  /  i=i 

where  Ra  (p)  is  the  Renyi  entropy,  a  >  0  and  a  1. 

Using  the  Jensen  inequality,  it  is  easy  to  check  that  the  Jensen- 
Renyi  divergence  is  nonnegative  for  a  G  (0, 1).  It  is  also  sym¬ 
metric  and  vanishes  if  and  only  if  the  probability  distributions 
Pi,p2, . . .  ,pn  are  equal,  for  all  a  >  0. 

When  a  — >  1,  the  Jensen-Renyi  divergence  is  exactly  the  gen¬ 
eralized  Jensen-Shannon  divergence  [7]. 

Unlike  other  entropy-based  divergence  measures  such  as  the 
well-known  Kullback  Leibler  divergence,  the  Jensen-Renyi  diver¬ 
gence  has  the  advantage  of  being  symmetric  and  generalizable  to 
any  finite  number  of  probability  distributions,  with  a  possibility  of 
assigning  weights  to  these  distributions. 

In  the  sequel,  we  will  restrict  a  G  (0, 1),  unless  specified  oth¬ 
erwise,  and  will  use  a  base  2  for  the  logarithm,  i.e.,  the  measure¬ 
ment  unit  is  in  bits. 

The  following  result  establishes  the  convexity  of  the  Jensen- 
Renyi  divergence  of  a  set  of  probability  distributions. 

Proposition  1  Fora  G  (0, 1),  the  Jensen-Renyi  divergence  JR *£ 
is  a  convex  function  ofp} ,  p2, . . . ,  pn. 

Proof.  Recall  that  the  mutual  information  between  two  finite  sets 
X  =  {xi,x2, . . .  ,xn}  andE  =  {yi,y2,  ...,yk}  is  given  by  [8] 

I{X-Y)  =  H{Y)  -  H{Y\X),  (4) 

where  H(Y)  is  the  Shannon  entropy  of  Y  and  H(Y\X)  is  the 
conditional  Shannon  entropy  of  Y,  given  X. 


Instead  of  using  Shannon  entropy  in  (4),  the  mutual  informa¬ 
tion  can  be  generalized  using  Renyi  entropy.  Therefore,  the  a- 
mutual  information  can  be  defined  as 

Ia(X-Y)  =  Ra(Y)  -  Ra(Y\X), 

where  Ra  is  the  Renyi  entropy  of  order  a  G  (0, 1). 

Denote  by  P(X  =  Xi)  -  w,,  P(Y  =  y3  |X  =  xf)  =  pij  and 
P(Y  =  yj)  =  qj ,  then  it  is  easy  to  check  that 

Ra (Y)  -  Ra (Y\X)  =  JR“(p1,p2,...,pn)1  (5) 

where  pi  =  (pij)i<j<k,  for  all  i  =  1, . . . ,  n. 

For  fixed  wj,  the  mutual  information  is  a  convex  function  of 
Pij  [8],  then  it  can  be  verified  that  the  a-mutual  information  is  also 
a  convex  function  of  py ,  leading  to  the  Jensen-Renyi  divergence  a 
convex  function  of  p1 ,  p2 , . . . ,  pn .  ■ 

Proposition  2  The  Jensen-Renyi  divergence  achieves  its  maximum 
value  when  p, ,  p2, . . . .  pn  are  degenerate  distributions. 

Proof.  The  domain  of  JRff  is  a  convex  polytope  in  which  the  ver¬ 
tices  are  degenerate  probability  distributions.  That  is,  the  maxi¬ 
mum  value  of  the  Jensen-Renyi  divergence  occurs  at  one  of  the 
degenerate  distributions.  ■ 

Since  the  Jensen-Renyi  divergence  is  a  convex  function  of 
Pj ,  p2 , . . . ,  pn ,  then  it  achieves  its  maximum  value  when  the  Renyi 
entropy  function  of  the  w-weighted  average  of  degenerate  proba¬ 
bility  distributions,  achieves  its  maximum  value  too. 

Next  problem  is  to  assign  weights  u>i  to  the  degenerate  distri¬ 
butions  Ai,  A2, . . . ,  An,  that  is  to  say,  an  assignment  {w,}  — > 
A i  =  {Sij  }  must  be  found,  where  {Sij  }  are  probability  mass  func¬ 
tions,  i.e.  Sij  =  1  if  i  =  j  and  0  otherwise.  The  following  upper 
bound  thus  holds 

JRa  <Ra  .  (6) 

Without  loss  of  generality,  consider  the  Jensen-Renyi  divergence 
with  equal  weights  u>i  =  1/n  for  all  i,  and  denote  it  simply  by 
JRa.  Using  (6),  the  following  holds 

JRa  <  Ra( a)  +  log(n)>  (7) 

where 

n 

o  =  (01,02, ...  ,Ofc)  suchthat  aj  =  (8) 

i=l 

Since  Ai ,  A2, . . . ,  A„  are  degenerate  distributions,  then  we  have 
Y^j= i  aj  —  n-  From  (7),  it  is  clear  that  JRa  achieves  its  maxi¬ 
mum  value  when  Ra  (a)  achieves  its  maximum  too. 

In  order  to  maximize  Ra  (a),  the  concept  of  majorization  will 
be  used  [9],  Let  (x[i],  xpj, . .  • ,  xpj)  denote  the  non-increasing 
arrangement  of  the  components  of  a  vector  x  =  (xi,x2, . . .  ,xk). 

Definition  2  Let  a  and  b  G  N* ,  a  is  said  to  be  majorized  by  b, 
written  a  -<  b,  if 

(  E|=i  =  E^=i  hi 
\  Ej=i  «[j]  <  Ej=i  by],  t- 1,2,  ...,k-  1. 
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Since  R„  is  Schur-concave  function,  then  Ra(a)  >  Rn{b)  when¬ 
ever  a  <b. 

The  following  result  establishes  the  maximum  value  of  the 
Jensen-Renyi  divergence. 

Proposition  3  Let  pup2, . . . ,  pn  be  n  probability  distributions 
with 

Pi  =  ( Pil,P»2 .  •  •  • ,Pik ),  Pij  =  !>  PU  >  °- 


Ifn  =  r  (mod  k),  0  <  r  <  k,  then 


JRa  < 


1  —  a 


log 


(k  -  r)qa  +  r(q  +  l)ra 
(■ qk  +  r)a 


where  q  =  (n  —  r)/k,  and  a  €  (0, 1). 
Proof.  It  is  clear  that  the  vector 


(9) 


_ £ _  k  —  r 

g  =  {q  +  l,...,q  +  l,q~*~^q) 

is  majorized  by  the  vector  a  defined  in  (8).  Therefore,  Rn(a )  < 
Ra(g).  This  completes  the  proof  using  (7).  H 


According  to  proposition  3,  when  n  =  0  (mod  k)  the  follow¬ 
ing  inequality  holds 

J/?a(p1,p2,...,pn)  <  log(fc). 


4.  ISAR  IMAGE  REGISTRATION 


Fig.  1.  ISAR  image  of  moving  target  reconstructed  by  the  Discrete 
Fourier  Transform 

To  form  a  radar  image,  N  bursts  of  received  signals  are  sam¬ 
pled  and  organized  burst  by  burst  into  a  M  x  N  two-dimensional 
array.  This  sample  matrix  is  not  uniformly  spaced  in  the  spatial 
frequency,  instead,  it  is  polar  formatted  data.  The  Discrete  Fourier 


Transform  processing  of  the  polar  formatted  data  would  result  in 
blurring  at  the  edges  of  the  target  reflectivity  image.  Fig.  1  is 
a  synthetic  ISAR  image  of  an  aircraft  MIG-25  [10],  The  radar 
is  assumed  operating  at  9 GHz  and  transmits  a  stepped-frequency 
waveform.  In  each  burst,  64  stepped  frequency  are  used.  The 
pulse  repetition  frequency  is  15KHz.  Basic  motion  compensa¬ 
tion  processing  has  been  applied  to  the  data.  A  total  of  512  sam¬ 
ples  of  the  time  history  series  are  taken  to  reconstruct  the  image 
of  this  aircraft,  which  corresponds  to  2.18s  of  integration  time. 
As  we  can  see.  the  resulting  image  is  defocused  due  to  the  tar¬ 
get  rotation.  In  fact,  the  defocused  image  in  Fig.  1  is  formed 
by  overlapping  a  series  of  MIG-25s  at  different  viewing  angles. 
By  replacing  the  Fourier  transform  with  the  time  varying  spec¬ 
tral  analysis  techniques  [11],  we  can  take  a  sequence  of  snap¬ 
shots  of  the  target  during  the  2.18s  of  integration  time.  Fig.  2 
shows  the  trajectory  of  the  MIG-25,  with  6  image  frames  taken 
at  t-  0.1280s,  0.4693s.  0.8107s,  1.1520s  ,  1.4933s,  1.8347s 
respectively. 


<1>  <2>  <3> 


Fig.  2.  Trajectory  of  a  sequence  of  MIG-25  image  frames 

Then  image  registration  can  be  applied  to  estimate  the  target 
motion  in  its  trajectory.  In  this  specific  example,  given  a  sequence 
of  ISAR  image  frames  {7;}fL0  which  are  observed  in  a  time  in¬ 
terval  [0,  Tj.  we  search  for  the  rotation  angle  {6i}i=^.  Denote 
r  —  Ii- 1  and  /  =  7,  for  i  —  1,2, ...  N,  then  by  Equation  (1),  6i 
is  given  by 


0‘  =  arg  max  .777^  (p,  (/,  Tetr), . . . , pn (/,  Te{ r)). 


Fig.  3  shows  the  rotation  angles  {ftjj'L]  obtained  by  regis¬ 
tering  the  6  consecutive  MIG-25  image  frames.  As  we  can  see,  a 
plays  an  important  role  in  controlling  the  measurement  sensitivity. 
When  q  <  1,  the  peak  of  the  JR-divergence  vs  6  is  much  shaper 
than  the  traditional  Shannon  entropy  (a  =  1)  based  mutual  infor¬ 
mation  method.  Obviously,  a  sharper  peak  would  help  to  obtain  a 
more  accurate  estimate  of  the  true  rotation  angle. 

By  interpolating  we  obtain  a  trajectory  of  the  MIG- 

25  rotational  motion  during  the  imaging  time  as  shown  in  Fig.  3, 
then  the  polar  re -formating  [1]  can  be  used  to  re-sample  the  re¬ 
ceived  signal  into  rectangular  format  and  generate  a  clear  image  of 
the  MIG-25  based  on  all  the  received  signals  in  the  time  interval 
[0,  2.18s],  as  demonstrated  in  Fig.  4. 
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0)  (2) 


Fig.  3.  Image  registration  of  a  MIG-25  Trajectory 


5.  CONCLUSIONS 


A  generalized  information-theoretic  divergence  measure  based  on 
the  Renyi  entropy  is  proposed  in  this  paper.  We  proved  the  con¬ 
vexity  of  this  divergence  measure  and  derived  its  maximum  value. 
Using  the  Jensen-Renyi  divergence,  we  propose  a  new  approach 
to  the  problem  of  ISAR  image  registration.  The  ISAR  imagery  is 
induced  by  target  rotation,  which  in  turn  causes  time  varying  spec¬ 
tra  of  the  reflected  signals  and  blurs  the  target  image.  The  goal  of 
ISAR  image  registration  is  to  estimate  the  target  motion  for  further 
motion  compensation  processing.  Our  approach  applies  Jensen- 
Renyi  divergence  to  measure  the  statistical  dependence  between 
consecutive  ISAR  image  frames,  which  would  be  maximal  if  the 
images  are  geometrically  aligned.  Compared  to  the  mutual  infor¬ 
mation  based  registration  techniques,  the  Jensen-Renyi  divergence 
endows  us  the  ability  to  control  the  measurement  sensitivity  of  the 
joint  histogram.  This  flexibility  would  result  in  a  better  registra¬ 
tion  accuracy.  Maximization  of  the  Jensen-Renyi  divergence  is  a 
very  general  criterion,  because  no  assumptions  are  made  regard¬ 
ing  the  nature  of  this  dependence  and  no  limiting  constraints  are 
imposed  on  image  contents.  Simulation  results  demonstrate  that 
our  approach  obtains  an  accurate  estimation  of  target  rotation  au¬ 
tomatically  without  any  prior  feature  extraction. 


Fig.  4.  Reconstructed  MIG-25  by  polar  reformatting 
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ABSTRACT 

Space-time  adaptive  processing  (STAP)  has  emerged  as  a  key 
technology  for  improving  the  performance  of  radar  systems  re¬ 
quired  to  operate  in  the  presence  of  severe  and  dynamic  interfer¬ 
ence  which  generally  includes  clutter  as  well  as  jamming.  While 
the  theory  of  optimum  STAP  is  well  known,  practical  issues  such 
as  interference  heterogeneity,  finite  sample  support,  mismatched 
signal  models  and  computational  load  need  to  be  overcome  when 
it  comes  to  implementing  STAP  in  operational  radar  systems.  This 
paper  proposes  an  advanced  STAP  formulation  which  addresses 
important  issues  facing  practical  implementation  and  then  tailors 
this  general  formulation  for  the  case  of  interference  rejection  in 
over-the-horizon  (OTH)  radar  to  experimentally  evaluate  its  target 
detection  and  localisation  performance. 

1.  INTRODUCTION 

The  optimal  STAP  weight  vector,  which  maximises  the  signal-to- 
interference  plus  noise  ratio  (SINR)  at  the  radar  output,  is  given 
in  terms  of  the  space-time  interference  covariance  matrix  and  the 
desired  signal  response  vector  for  each  azimuth-range-Doppler  bin 
processed  [1,2],  In  practice,  the  statistically  expected  interference 
covariance  matrix  is  unknown  and  must  be  estimated  from  the  re¬ 
ceived  data,  similarly  the  desired  signal  response  vector  is  often  es¬ 
timated  with  a  suitable  model  called  a  steering  vector.  The  sample 
matrix  inverse  (SMI)  technique  directly  substitutes  the  sample  in¬ 
terference  covariance  matrix  and  the  signal  response  vector  model 
for  their  statistically  expected  or  “true”  forms  in  order  to  estimate 
the  optimal  STAP  weight  vector  for  th ^adaptive  implementation. 

The  SMI  technique  is  known  to  converge  slowly  towards  the 
optimal  solution  when  the  “training  data”  used  to  form  the  sample 
covariance  matrix  contains  the  test  cell(s)  to  be  processed  [3,4]. 
For  this  reason,  the  majority  of  STAP  algorithms  exclude  the  test 
cell(s),  referred  to  as  primary  data,  from  the  training  set  referred  to 
as  secondary  data.  Although  this  strategy  avoids  the  potential  for 
target  self-nulling,  degradations  in  output  SINR  may  result  if  the 
second  order  statistics  of  the  interference  are  heterogeneous  over 
the  primary  and  secondary  data  vectors,  furthermore,  the  use  of 
partitioned  data  sets  inherits  another  latent  but  significant  practical 
problem  regarding  unwanted  detections  or  false  alarms  [5]. 

The  presence  of  strong  target  signals  or  clutter  discretes  in 
the  test  cell  but  at  an  angle-Doppler  different  to  the  “look”  angle- 
Doppler  need  to  be  suppressed  from  the  output  but  may  appear  as 
or  false  alarms  through  the  sideiobes  of  the  adapted  pattern  be¬ 
cause  the  secondary  data  does  not  contain  information  about  them. 

The  first  author  acknowleges  A.  Farina  for  useful  dscussions  on 
matched  subspace  detectors  and  B.  White  for  replaying  experimental  data. 


and  consequently,  the  resulting  adaptive  weights  are  unlikely  to 
suppress  such  signals  effectively.  This  problem  has  received  insuf¬ 
ficient  attention  in  the  literature,  in  fact,  [6]  states  that  the  problem 
of  target  detection  in  a  non-homogeneous  test  cell  “has  not  been 
addressed"  and  proposes  a  method  for  addressing  this  “hitherto  un¬ 
solved  problem".  The  proposed  method  uses  a  non-statistical  step 
which  operates  on  the  test  cell  only  to  cancel  targets  and  clutter 
discretes  in  the  sideiobes.  followed  by  a  statistical  step  employing 
secondary  data  to  cancel  residual  interference  in  the  test  cell. 

An  alternative  attack  on  this  problem  suggested  by  [7]  involves 
the  inclusion  of  a  “piece”  of  the  test  cell  in  the  training  data  set 
to  “balance  the  risk  of  target  self-nulling  against  the  opportunity 
to  capture  the  actual  nonstationary  interference  present  within  the 
test  cell”.  This  intuitively  appealing  method  proposes  the  use  of  a 
“deemphasis”  factor  ranging  between  0  and  1  to  scale  the  amount 
of  the  test  cell  included  in  the  training  data  (0  =  no  test  cell  in 
training  set.  1  =  test  cell  completely  in  training  set).  A  scheme  for 
selecting  the  deemphasis  factor  was  not  proposed  and  the  authors 
stated  that  “the  best  criterion  for  choosing  this  factor  is  an  open 
problem  and  deserves  further  study". 

This  paper  proposes  a  method  for  selecting  the  deemphasis 
factor  for  the  test  cell(s)  in  the  primary  data,  moreover,  this  fac¬ 
tor  is  dependent  on  the  look  angle-Doppler  bin  so  that  target  self¬ 
nulling  can  effectively  be  avoided  while  simultaneously  reducing 
false  alarms  and  capturing  the  potentially  different  interference 
characteristics  in  the  primary  data.  The  proposed  method  is  in¬ 
corporated  into  an  advanced  STAP  formulation  which  includes 
various  features  useful  for  practical  implementation  such  as  sub¬ 
space  models  which  take  imperfect  target  signal  coherence  in  an¬ 
gle  and/or  Doppler  into  account  [8.9],  localised  processing  for  en¬ 
hanced  heterogeneous  or  nonstationary  interference  cancellation 
[10.1 1],  rank  reduction  for  increased  convergence  rate  and  com¬ 
putational  speed  [12]  and  alternative  loading  techniques  to  reduce 
degradations  caused  by  finite  sample  support  [13]. 

2.  STAP  ALGORITHM 

The  generalised  sidelobe  canceller  (GSC)  STAP  implementation 
operates  by  forming  a  matched  filter  to  a  series  of  candidate  sig¬ 
nal  models,  denoted  by  steering  vectors  s(t/>)  for  different  signal 
parameters  ij>  =  rfn , ..,  iJ>n,  to  cover  the  search  space  quickly  yet 
with  some  degree  of  reliability  and  subtracting  an  auxiliary  adap¬ 
tive  filter  w„  which  has  minimal  impact  on  desired  signals  entering 
through  the  “main  lobe”  of  but  cancels  as  much  interference 
as  possible  leaking  through  the  “sideiobes”  of 

An  adaptive  GSC-STAP  implementation  which  incorporates 
subspace  signal  models,  localised  processing,  rank  reduction,  and 
steer  dependent  deemphasis  factors  is  formulated  in  terms  of  the 
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minimiser  wa  of  the  following  criterion  function; 

/( w)  =  |!DX(v-w)||  +  A||Cw-f||+K||SI/2(v-w)||  (1) 

where  v  is  a  (possibly  tapered)  steering  vector  with  its  dependence 
on  ip  suppressed  for  notational  convenience,  ||  •  ||  denotes  Frobe- 
nius  or  Euclidean  squared  norm  and  the  significance  of  other  terms 
in  (1)  will  be  explained  below.  The  resultant  space-time  adaptive 
weight  vector  wr  =  v  —  w„  is  then  used  to  filter  the  primary 
space-time  snapshot  or  test  cell  x  to  form  a  scalar  GSC-STAP  out¬ 
put  y  =  wf  x.  In  general,  the  ^-dimensional  complex  vector 
x  is  the  sum  of  a  signal  component  s  with  normalised  space-time 
structure  sH s  =  1,  amplitude  p  >  0  and  reference  phase  e3'1'  and 
uncorrelated  interference-plus-noise  n. 

x  =  pej4,s  +  n  (2) 

The  interference-plus-noise  in  the  test  cell  n  is  assumed  to  be  zero- 
mean  and  multi-variate  complex  Gaussian  distributed  with  statisti¬ 
cally  expected  spatial  covariance  matrix  R„  =  E{nnH  }.  Ideally , 
the  signal  vector  s  can  be  represented  by  a  space-time  steering  vec¬ 
tor  s(ipo)  parameterised  for  example  by  tp0  =  [0O,  wo]  where  90 
and  ujo  are  the  unknown  signal  angle-of-arrival  and  Doppler  fre¬ 
quency  respectively. 

The  first  term  (1)  implements  deemphasised  localised  process¬ 
ing  by  defining  an  F  x  iV  matrix  X  =  [xi  •  •  •  xp]w  composed  of 
P  test  cell  vectors  xp  for  p  =  1, P  and  a  P  x  P  diagonal  ma¬ 
trix  D  =  diag[ 71  •  •  •  7 p]  containing  the  primary  data  deemphasis 
factors  7 p  with  their  (^)-dependence  momentarily  suppressed.  In 
essence,  the  role  of  the  deemphasis  factors  is  to  “supervise”  adap¬ 
tive  filter  training  by  allowing  yp  -7  1  when  the  snapshot  xp  is 
likely  to  contain  interference  only  or  interference  plus  a  sidelobe 
signal  (i.e.  a  discrete  interferer)  and  -yp  -4  0  when  xp  is  likely 
to  contain  interference  plus  a  target  signal.  A  method  for  selecting 
the  deemphasis  factors  7 P(ipn)  for  each  system  steer  parameter  tpn 
(rc  =  1, ...  IV)  is  described  later. 

The  second  term  in  (1)  concerns  the  application  of  linear  con¬ 
straints  on  the  auxiliary  adaptive  weight  vector  wa  to  prevent  tar¬ 
get  cancellation  when  the  criterion  function  /( w)  is  minimised 
and  to  form  optional  “anticipatory”  nulls  or  derivative  constraints 
on  the  adapted  pattern  [14],  The  M  <  N  linear  constraints,  spec¬ 
ified  by  the  M  x  N  constraint  matrix  C  and  the  M-dimensional 
constraint  vector  f,  can  be  enforced  by  letting  A  -4  00.  To  min¬ 
imise  target  self-nulling  caused  by  signal  model  mismatch  (i.e. 
differences  between  the  received  signal  s  and  the  most  closely 
matched  steering  vector  s (ipn)  where  n  =  1,2,  ..N),  a  low-rank 
linear  subspace  model  can  be  adopted;  s ^  =  H(ip)p  where  the 
term  H(U)  £  CNxM  is  a  pre-determined  full  rank  mode  matrix 
(e.g.  a  wave  interference  model  H(ip)  =  [s (ip  +  Ai)  •  •  •  s(ip  + 
Am)]  where  the  Am  for  m  =  1,2 ,..,M  «  N  are  positive 
or  negative  displacements  closely  clustered  to  the  nominal  system 
steer  parameter  ip  [8])  and  p  €  CM  x  1  is  an  unknown  coordinate 
vector.  In  this  case,  setting  C  to  the  Hermitian  of  H{ip),  f  to  the 
zero  vector  and  A  -4  00  ensures  that  the  auxiliary  weight  vec¬ 
tor  wa  estimated  for  sidelobe  cancellation  remains  orthogonal  to 
signals  spanned  by  the  target  subspace. 

The  final  term  in  (1)  represents  the  output  power  of  the  GSC 
(weighted  by  a  factor  k)  in  a  manner  consistent  with  the  interference- 
plus-noise  sample  covariance  matrix  S  given  by, 

S=  =USUH  ,  S“1/2  =  US“1/2UH  (3) 


where  n*  for  k  —  1, ..,  K  are  judiciously  chosen  [15]  secondary 
sample  vectors  assumed  to  contain  interference-plus-noise  only 
and  UEUW  represents  the  eigen-decomposition  of  S.  This  term 
stabilises  the  adaptive  pattern  so  as  to  maintain  effective  interfer¬ 
ence  cancellation  when  the  number  of  primary  data  vectors  in  X 
that  are  likely  to  contain  interference-plus-noise  only  is  limited  as 
well  as  to  increase  robustness  against  target  self-nulling  caused  by 
deemphasis  factor  estimation  errors. 

When  the  target  protection  constraints  are  strictly  enforced 
(A  -4  00)  and  no  other  constraints  are  applied  (f  =  0),  the  so¬ 
lution  for  the  resultant  weight  vector  wT.  is  given  by, 

wr=Z“1Cff[CZ'1Cffr1Cv  (4) 

where  Z  =  XHDHDX  -)-  kS  may  be  regarded  as  the  deempha¬ 
sised  primary  data  covariance  matrix  “loaded”  to  a  level  k  by  the 
secondary  data  covariance  matrix  S.  The  resultant  vector  in  (4)  is 
calculated  for  each  system  steer  parameter  ip  —  ipi , ip^. 


3.  DEEMPHASIS  FACTOR 


Using  the  theory  of  matched  subspace  detectors  (MSDs)  [16]  and 
adaptive  subspace  detectors  (ASDs)  [17]  the  value  of  7 (ip)  £ 
[0, 1]  is  adaptively  generated  based  on  the  degree  of  confidence 
that  a  target  with  SNR  greater  than  some  prescribed  value  is  present 
in  the  test  cell.  In  the  adaptive  case,  the  unknown  interference  co- 
variance  matrix  R„  is  estimated  by  its  sample  covariance  matrix  S 
in  (3)  and  it  has  been  shown  that  when  an  unknown  scaling  0  exists 
between  the  interference-plus-noise  in  the  primary  data  and  that  in 
the  secondary,  maximising  the  likelihood  functions  yields  ASDs 
which  are  sample  matrix  versions  of  the  corresponding  MSDs  [17]. 

As  the  interference  may  have  heterogeneous  statistical  prop¬ 
erties  over  the  radar  sampling  grid,  it  is  considered  beneficial  to 
use  secondary  sample  vectors  n*  located  physically  “close”  to  the 
test  cell(s)  on  the  basis  that  nearby  data  is  statistically  homoge¬ 
neous  or  at  least  more  nearly  so.  A  rank-reduction  linear  trans¬ 
form  T  £  CNxL  of  full  rank  L  may  be  incorporated  for  adaptive 
subspace  detection  to  improve  convergence  rate  (at  the  expense  of 
reduced  degrees  of  freedom)  when  the  number  of  local  secondary 
samples  K  is  limited  to  improve  the  quality  of  interference  homo¬ 
geneity.  Following  this  lead,  the  data  x  and  signal  model  H (ip) 
are  transformed  to  a  reduced-rank  quasi-whitened  space, 

z  =  $-1/2T"x,  G(ip)  =  4>-1/2ThH(»/>)  (5) 

by  defining  the  reduced-rank  sample  covariance  matrix  given  by 
$  =  TWST,  its  eigen-decomposition  4>  =  Q  AQW  and  the  Her¬ 
mitian  square  root  <F1/2  =  QA1/2QW.  It  is  then  proposed  to  use 
the  ratio  of  the  energy  in  z,  per  dimension,  that  lies  in  the  resulting 
signal  subspace  (assumed  to  have  full  rank  L)  to  the  energy  in  z, 
per  dimension,  that  lies  in  the  orthogonal  subspace  to  indicate  the 
“likelihood”  of  target  signal  presence  within  the  test  cell; 


F{iP)  = 


zHPG{ip)z/M 

zh{I-Pg{iP)}z/(L-M) 


(6) 


where  Pg {ip)  —  G(ip)[GH  (ip)G(ip)]~1G(tp)  is  the  projector 
onto  the  transformed  target  signal  subspace  model.  As  the  number 
of  statistically  homogeneous  samples  K  increases  $  -4  THR„T 
the  ratio  in  (6)  tends  to  be  F-distributed  F(ip)  -4  F(ap)[v\  with 
a  =  2M  and  /3  =  2 (L  —  M)  degrees  of  freedom  and  non¬ 
centrality  parameter  v  =  (p2 / o2)sH Ppf  s  where  s  =  TH s  and 
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R„  =  TWR„T  providing  there  is  no  signal  mismatch,  or  stated 
mathematically  s  =  H(V-’)P-  While  this  distribution  is  not  strictly 
valid  for  the  practical  finite  sample  ASD,  it  tends  to  be  approxi¬ 
mately  valid  when  sufficiently  large  and  quasi-homogeneous  train¬ 
ing  sets  are  used  and  strong  enough  target  signals  are  considered. 
A  soft  decision  approach  which  allows  7 (ij>)  to  vary  continuously 
between  0  and  1  can  be  proposed  as; 

POO 

y(ip)  =  /  PfU)  df  ,  F  :  F(o,fl)[ v]  (7) 

Jfw) 

where  pf{})  is  the  probability  density  function  of  the  appropriate 
non-central  F-distribution  for  a  design  signal-to-noise  ratio  (SNR) 
v.  The  selection  of  v  represents  a  tradeoff  between  higher  sen¬ 
sitivity  to  weaker  signals  (lower  v)  and  enhanced  localisation  of 
stronger  signals  (higher  v).  Although  no  optimality  is  claimed  for 
this  method,  its  performance  will  be  experimentally  demonstrated 
in  the  next  section. 

4.  EXPERIMENTAL  RESULTS 

The  data  for  this  study  were  collected  by  16  uniformly  spaced 
narrowband  receivers  of  the  high  frequency  (3-30  MHz)  Jindalee 
OTH  radar,  located  near  Alice  Springs  in  central  Australia.  The 
linear  aperture  spanned  by  the  1 6  receivers  is  approximately  1 .4  km 
with  each  receiver  connected  to  a  subarray  composed  of  28  dual¬ 
fan  vertically  polarised  antenna  elements,  see  [8]  for  further  de¬ 
tails  regarding  this  facility.  The  data  received  in  each  subarray 
is  range  formed  and  Doppler  processed  using  conventional  (FFT- 
based)  processing,  in  this  section  we  are  concerned  with  the  adap¬ 
tive  filtering  of  space-time  vectors  constructed  by  stacking  two  ar¬ 
ray  snapshots  from  consecutive  range  cells  (the  first  being  the  test 
range  cell)  recorded  at  a  particular  Doppler  bin  in  order  to  detect 
and  azimuthally  localise  useful  signals  embedded  in  structured  in¬ 
terference. 

The  data  consists  of  a  superposition  of  radio  frequency  inter¬ 
ference  (RFI)  emitted  by  a  source  situated  in  the  Darwin  region 
(1250  km  to  the  north  of  the  receiver  site  and  22  degrees  from  the 
array  boresight)  at  a  carrier  of  16.052  MHz  and  a  relatively  weaker 
coherent  signal  transmitted  from  a  spatially  separate  source  less 
than  2  minutes  earlier  at  16.106  MHz,  both  interference  and  signal 
were  propagated  by  the  ionosphere  to  the  radar  which  was  operated 
in  passive  mode  (i.e.  with  transmitters  switched  off).  The  receive 
system  processed  a  total  of  42  ranges  in  each  pulse  repetition  in¬ 
terval  (PRI)  with  128  PRI  used  for  Doppler  processing  during  a 
4.2  second  coherent  processing  interval  (CPI).  The  coherent  signal 
was  localised  at  known  ARD  coordinates  (namely,  range  cell  26, 
Doppler  bin  33  and  beam  number  7)  but  could  not  be  detected  after 
conventional  processing  due  to  the  presence  of  RFI  which  spread 
over  the  entire  range-Doppler  search  space. 

To  perform  STAP  at  test  range  cell  26,  the  secondary  samples 
used  to  form  the  sample  covariance  matrix  S  were  taken  from  all 
Doppler  bins  processed  in  range  cells  24  and  28,  which  immedi¬ 
ately  neighbour  a  guard  ceil  placed  either  side  of  the  test  cell  in 
order  to  isolate  the  secondary  data  from  range  sidelobes  of  the  co¬ 
herent  radar  signal.  Note  that  the  number  K  of  local  training  sam¬ 
ples  used  to  form  S  is  A'  =  2P  =  8N  where  P  =  128  Doppler 
bins  and  N  =  32  (i.e  16  receivers  and  2  stacked  range  cells).  The 
target  subspace  model  H(V')  adopted  was  the  wave  interference 
model  for  16  nominal  steer  directions  corresponding  to  mutually 
orthogonal  array  steering  vectors  s(ifin)H s(i/v)  =8(n-n)  and 


the  target  subspace  dimension  was  chosen  to  be  M  =  3  with  com¬ 
ponents  Ai  =  0  and  A2  =  -A3  equal  to  half  the  Rayleigh  dis¬ 
tance  either  side  of  t/’-  No  tapering  of  the  matched  filter,  v  =  s(ip), 
was  used  to  form  the  GSC-STAP  output  for  P  =  128  space-time 
vectors  in  the  primary  data  matrix  X  =  [xi  ■  •  •  xp]  (one  for  each 
Doppler  bin  at  the  test  range  cell  26). 


4.1.  Traditional  SMI  filtering 

The  solid  and  dotted  curves  in  Fig.  1  relate  to  the  left  vertical  axis 
and  represent  the  Doppler  spectra  resulting  in  range  cell  26  and 
beam  number  7  when  the  standard  SMI  technique  is  used  without 
the  test  cells  (k  =  K,  D  =  0)  and  with  the  test  cells  (k  = 
K,  D  =  I)  included  for  training  respectively  ( 1 ).  Evidently,  the 
inclusion  of  the  test  cells  (dotted  line)  is  detrimental  in  the  standard 
SMI  approach  as  it  causes  target  self-nulling  of  at  Doppler  bin  33. 
Note  that  a  loss  of  5-10  dB  in  target  signal  strength  is  apparent 
compared  to  the  case  where  test  cells  are  excluded  (solid  line). 

The  (+)  and  (*)  symbols  in  Fig.l  relate  to  the  right  vertical 
axis  and  represent  the  normalised  beam  spectra  (i.e.  noisefloor  at 
0  dB)  resulting  in  range  cell  26  and  Doppler  bin  33  when  the  test 
cells  are  excluded  and  included  respectively.  Although  the  max¬ 
ima  occurs  in  beam  number  7  in  both  cases,  the  exclusion  of  the 
test  cell  (+)  causes  several  false  alarms  (i.e.  peaks  in  beamspace 
significantly  above  the  noisefloor)  due  to  the  unregulated  sidelobe 
response  while  the  inclusion  of  the  test  cell  (*)  results  in  a  better 
sidelobe  response  but  significantly  degraded  S1NR  (approximately 
10  dB)  at  beam  number  7  compared  with  the  former  due  to  target 
self-nulling.  In  a  companion  paper  [18]  that  deals  with  spatial- 
only  processing,  it  is  shown  that  attempts  to  stabilise  the  sidelobes 
by  diagonal  loading  [19]  leads  to  intolerable  degradations  in  inter¬ 
ference  cancellation  performance  and  hence  does  not  constitute  a 
feasible  solution  for  this  problem. 


4.2.  Deemphasised  SMI  filtering 

The  solid  line  and  (+)  symbols  in  Fig.2  (same  format  as  Fig.l) 
show  the  Doppler  and  normalised  beam  spectra  corresponding  to 
GSC-STAP  filters  adapted  using  the  (V’)-dependent  deemphasis 
factors  y(V')  calculated  according  to  (7)  with  a  design  SNR  of 
i/=6  dB  an  L  =  16  rank-reduction  transform  T  =  [I  |  0]w  which 
selects  the  test  range  cell  (i.e.  spatial-only  processing)  and  a  load¬ 
ing  factor  k  =  1.  A  constant  design  value  of  ;'=6  dB  was  used  for 
all  azimuth-Doppler  cells  to  avoid  re-calculation  of  the  cumulative 
non-central  F-distribution. 

The  dotted  curve  and  (*)  symbols  in  Fig.2  show  the  Doppler 
and  normalised  beam  spectra  resulting  for  a  quiescent  vector  = 
t®s(V')  employing  a  cosine-pedestal  taper  t  with  no  adaptive  side¬ 
lobe  cancellation.  The  adaptively  deemphasised  SMI  output  has  a 
target  SINR  of  almost  40  dB  in  beam  number  7  which  much  larger 
than  that  of  the  standard  SMI  approaches  considered  in  Fig.l,  as 
well  as  having  the  added  advantage  of  completely  removing  false 
alarms  by  effectively  reducing  all  other  peaks  in  beamspace  to  the 
noisefloor.  A  comparison  between  the  adaptively  deemphasised 
SMI  output  and  the  conventional  one  in  Fig  3  demonstrates  that 
adaptive  sidelobe  cancellation  removes  an  extra  40  dB  of  the  pas¬ 
sively  received  interference  that  leaks  through  the  sidelobes  of  the 
quiescent  vector  to  detect  the  target  which  is  otherwise  not  de¬ 
tected  by 
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5.  CONCLUSION  AND  FUTURE  WORK 


The  effectiveness  of  the  adaptively  deemphasised  GSC-STAP  method 
proposed  in  this  paper  was  demonstrated  experimentally  (with  radar 
operated  in  passive  mode)  and  shown  to  outperform  traditional 
SMI  methods  both  in  terms  of  target  output  SINR  and  reduction 
of  false  alarms  at  moderately  higher  computational  cost. 

The  derivation  of  the  deemphasis  factor  is  not  claimed  to  be 
optimal  and  current  research  is  being  directed  at  the  problem  of 
formulating  appropriate  optimality  criteria  for  this  application. 
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Fig.  1.  Solid  line  and  (+)  symbol  correspond  to  (k  =  K,  D  =  0), 
dotted  line  and  (*)  symbol  correspond  to  (k  —  K,  D  =  I) 
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Fig.  2.  Solid  line  and  (+)  symbol  correspond  to  («  = 

1,D(^),T  =  [I  |  0]V  =  6  dB),  dotted  line  and  (*)  symbol 
correspond  to  v(ip)  =  t  ®  s (ip) 
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ABSTRACT 


2.  DESCRIPTION  OF  THE  SYSTEM 


Two  opposite  requirements  for  digital  broadband  Electronic  War¬ 
fare  receivers  are  detection  of  simultaneous  signals  and  real  time 
operation.  The  monobit  receiver  represents  an  attempt  to  achieve 
both  characteristics  at  the  expense  of  low  instantaneous  dynamic 
range.  This  paper  presents  a  detailed  theoretical  and  experimental 
analysis  and  characterization  of  the  performance  of  this  promising 
receiver. 


1.  INTRODUCTION 

The  proliferation  of  electronic  signals  in  modem  combat  environ¬ 
ments  requires  the  use  of  sophisticated  Electronic  Warfare  (EW) 
receivers.  Desirable  characteristics  of  EW  receivers  include  wide 
band  frequency  coverage,  high  sensitivity  and  dynamic  range,  high 
probability  of  intercept,  simultaneous  signal  detection,  frequency 
resolution,  and  full  real-time  operation.  A  classical  receiver  which 
accomplishes  these  requirements  is  a  channelized  receiver  [1]  which 
separates  signals  according  to  their  frequencies. 

Advancements  in  Analog-to-Digital  Converters  (ADC)  tech¬ 
nology  and  in  the  speed  of  digital  processors  have  made  possi¬ 
ble  to  design  relatively  wide  band  digital  channelized  receivers. 
However,  broadband  digital  channelized  receivers,  mainly  based 
on  Discrete  Fourier  Transform  (DFT)  related  processing,  are  com¬ 
putation  intensive  and  yet  not  suitable  for  real  time  applications  in 
spite  of  the  revolution  of  DSPs  and  FPGAs  speed1 .  In  an  attempt 
to  improve  the  real  time  operation,  parallel  processing  can  be  con¬ 
sidered.  Another  possibility  is  the  reduction  of  the  computational 
complexity  of  the  signal  processing  algorithms  by  the  simplifica¬ 
tion  of  the  operations,  e.g.,  avoiding  complex  multiplications  in 
the  calculations. 

This  is  the  philosophy  in  the  monobit  channelized  receiver  de¬ 
scribed  in  several  US  patents  [2,  3]  and  papers  [4],  As  it  was 
pointed  out  in  [4],  there  are  two  possibilities  in  order  to  avoid 
multiplications  in  the  calculation  of  the  DFT:  a  single-bit  digital 
representation  of  the  input  signal  [2],  which  is  equivalent  to  use  a 
hard  limiter,  or  a  monobit  representation  of  the  kernel  of  the  DFT 
[3],  Both  schemes  are  possible,  it  is  even  possible  to  use  both  in 
the  same  processing  algorithm. 

The  optimum  scheme  in  terms  of  number  of  operations  for 
the  DFT  is  the  Fast  Fourier  Transform  (FFT).  An  FFT  algorithm 
without  multiplications  is  only  possible  with  a  monobit  kernel. 

This  paper  focuses  on  the  theoretical  and  experimental  eval¬ 
uation  of  the  performance,  capabilities,  and  limitations  of  this  re¬ 
ceiver  for  the  detection  of  multiple  signals  for  radar  applications. 


The  system  considered  in  this  paper  is  depicted  in  figure  1.  The 
radio-frequency  front  end  is  not  included.  The  receiver  uses  a  one 
bit  ADC  followed  by  a  filter  bank  represented  by  a  monobit  DFT 
or  FFT.  Finally,  a  decision  is  made  using  the  module  of  the  differ¬ 
ent  outputs  of  the  DFT  or  FFT.  In  the  following  sections  different 
important  characteristics  of  the  behaviour  of  this  receiver  will  be 
explained. 

This  monobit  receiver  was  implemented  with  a  commercial 
module  composed  of  an  ADC  SMT320  of  12  bits  (we  use  only 
the  sign  bit)  and  a  DSP  from  TI:  TMS320C40.  The  experimental 
set-up  is  depicted  in  figure  2. 


Logic 

Threshold 


Fig.  1.  Schematic  diagram  of  the  analysed  monobit  receiver. 


Fig.  2.  Experimental  set-up:  Signal  generators  (noise  and  signals), 
combiner,  ADC.  and  DSP. 


3.  MONOBIT  DFT  AND  FFT 


The  concept  of  monobit  DFT  can  lead  to  different  implementations 
where  the  kernel  function  ejl>  is  rounded  to  1,  —  1,  j  or  —j  through 
the  function  G  (e^).  In  this  paper,  the  following  function  will  be 
used  [4]: 


1 


if-7  <<t><  T 
if  f  <  0  <  T 
if  ^  <  <t>  <  x 
if -3-f<4><-\ 


0) 


1  The  aim  of  the  proposed  system  is  to  cover  hundreds  of  MHz  with 
100%  real  time. 


Due  to  the  use  of  equation  ( 1 )  the  property  of  the  DFT  that 
assured  that  if  the  input  sequence  was  real-valued,  the  output  se- 
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quence  verified  X*  (k)  =  X  ( N  —  k)  does  not  apply  any  more, 
as  it  is  not  always  true  that  G  ( ej<tl )  =  G*  (e~7<#>). 

Three  possibilities  of  implementing  the  monobit  DFT  have 
been  considered,  which  lead  to  different  filter  banks: 

•  To  replace  directly  the  original  kernel  function  with  G  (•)• 
This  implementation  will  be  called  monobit  DFT.  This  is 
the  slower  implementation. 

•  To  implement  the  DFT  using  the  decimation  in  time  algo¬ 
rithm,  and  replace  the  coefficients  ej<l>  by  G  (ej4>).  It  will 
be  called  monobit  FFT  by  decimation  in  time. 

•  To  implement  the  DFT  using  the  decimation  in  frequency 
algorithm,  and  replace  the  coefficients  e3<s'  by  G  (V^).  It 
will  be  called  monobit  FFT  by  decimation  in  frequency. 

The  use  of  the  function  G  (■)  modifies  the  coefficients  of  the 
filters  in  the  three  implementations,  and  so  their  frequency  re¬ 
sponses.  However,  the  filters  obtained  from  both  monobit  FFT 
implementations  are  nearly  equal.  As  an  example,  the  frequency 
response  of  the  filter  of  channel  4  for  a  32-point  FFT  and  of  its 
counterpart  in  a  monobit  FFT  are  compared  in  figure  3.  A  new 
sidelobe  only  7.6  dB  below  the  maximum  in  the  monobit  ITT 
appears.  Windowing  cannot  improve  this  result. 

It  has  been  found  that  for  the  different  implementations  of  the 
monobit  DFT  or  FFT  there  are  always  two  different  kinds  of  chan¬ 
nels.  They  are  defined  as: 

•  Type  1  channels:  the  kernel  equals  1  or  —1  for  both  the 
original  and  the  monobit  implementations.  As  the  input  sig¬ 
nal  is  real  valued,  the  output  of  this  kind  of  channels  is  also 
real.  The  only  channels  of  this  kind  are  the  channel  0  (DC 
component)  and  the  channel  N/2  (high  frequency  compo¬ 
nent).  Quantization  error  due  to  the  function  G  (•)  is  zero 
as  the  coefficients  of  the  filters  have  not  changed. 

•  Type  2  channels:  the  coefficients  of  the  rest  of  the  channels 
verify  that  half  of  them  are  real  valued  (1  or  —1)  and  half 
of  them  are  imaginary  (j  or  —j).  Channels  IV/ 4  and  3IV/4 
verify  also  that  their  coefficients  have  not  changed  due  to 
the  use  of  G  (•).  The  output  of  a  type  2  channel  is  a  complex 
number. 


Fig.  3.  Filter  response  for  channel  4  from  an  original  32-point  DFT 
and  from  a  monobit  32-point  FFT  (-  -). 


4.  FALSE  ALARM  PROBABILITY 


The  1-bit  ADC  fixes  the  power  at  its  output  independently  of  the 
input  power.  Therefore,  the  false  alarm  probability  is  independent 
of  the  input  power  noise.  As  it  was  shown  in  the  previous  sec¬ 
tion,  there  are  two  different  kind  of  channels  in  the  receiver  whose 
behaviour  will  be  analysed  separately. 

In  a  type  1  channel,  the  filter  coefficients  are  all  equal  to  1  or 
—1.  Besides,  the  samples  of  the  input  signal  are  also  1  or  —1. 
After  multiplying  them,  a  vector  with  k+  elements  equal  to  1  and 
k-  elements  equal  to  —1  is  obtained.  The  sum  of  these  elements 
renders  k  =  k+  —  k-,  which  is  always  an  even  number  between 

2b,  b  a  natural 


— N  and  N  for  a  N-point  DFT  (FFT)  with  N 
number.  Besides,  k+  =  k-  = 

After  some  reasoning  based  on  combinatorial  theory,  we  can 
calculate  an  analytical  expression  for  the  probability  of  getting  an 
output  k: 

X  \  (  N\ 

N-k  1  1  N+k  i 

p^)  =  ^rL  =  S^J-  (2) 

After  a  linear  detector  at  the  output  of  each  filter,  we  obtain  an 
even  value  k‘  =  |fc|  with  values  0  <  k’  <  N.  The  probability 
of  getting  a  concrete  k'  is  the  sum  of  the  probabilities  of  getting 
k  =  k'  and  k  =  —k'  except  for  k'  =  0.  In  short: 


P(k') 


' N 


N_  I  T5"  if  k  —  0 


N 

N-k ' 


£ T  ifO  <k'<N 


(3) 


If  a  threshold  Th  is  chosen  at  the  output  of  the  linear  detector,  the 
false  alarm  probability  is  obtained  as: 


Pfa(Th)=  £  P(k') 

k’>Th 


(4) 


being  Pja  a  staircase  function  as  it  is  clear  in  figure  4. 


Fig.  4.  False  alarm  probability  for  both  type  1  and  type2  channels 
and  a  256-point  DFT- FF  l  . 

In  a  type  2  channel  the  output  is  a  complex  number.  Its  real 
and  imaginary  parts  are  both  even  numbers  greater  than  —N/2 
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and  lower  than  N/2.  Following  the  same  steps  used  for  a  type  1 
channel,  the  probabilities  of  getting  the  real  part  of  the  complex 
number  equal  to  kr  and  its  imaginary  part  equal  to  k,  are: 


Pr  (K)  = 


(5) 


Besides,  if  there  is  only  white  noise  at  the  input  of  the  system,  k, 
and  kr  are  independent  and  the  probability  of  obtaining  the  com¬ 
plex  number  kr  +  j  ■  ki  is: 


P  (kr  +  j  ■  kt)  = 


(6) 


The  output  of  these  channels  is  converted  to  a  real  number  using 
a  linear  detector.  At  its  output,  a  threshold  Th  is  chosen,  and  the 
false  alarm  probability  can  be  computed  as: 


P/a  =  ^  P  (kr  +  j  •  k,)  (7) 

\kr+j-*i\>Tk 

This  mathematical  expression  can  be  applied  to  any  channel  of  the 
three  implementations  as  long  as  it  is  a  type  2  channel. 

As  an  example,  in  figure  4  the  false  alarm  probability  as  a 
function  of  the  threshold  for  a  monobit  256-point  DFT  (FFT)  is 
shown.  This  figure  makes  clear  the  following  facts: 

•  Both  functions  are  staircase  functions  for  any  of  the  three 
implementations. 

•  The  number  of  different  values  of  false  alarm  probability 
available  is  sensibly  lower  in  a  type  1  channel  than  in  a 
type  2  channel. 

•  If  a  threshold  Th  >  N  for  a  type  1  channel  or  Th  >  N/sJ 2 
for  a  type  2  channel  is  chosen,  both  the  false  alarm  proba¬ 
bility  and  detection  probability  are  0  because  there  are  not 
any  outcomes  with  that  magnitude. 


5.  DETECTION  PROBABILITY 

A  figure  of  merit  of  this  system  regarding  detection  capabilities 
can  be  its  losses  compared  to  a  system  without  digitalization  and 
with  the  original  DFT  for  fixed  detection  and  false  alarm  probabil¬ 
ities.  Graphic  5  presents  the  average  losses  for  centred  sinusoidal 
signals,  P,i  =  90  %  and  P/a  =  10~3,  and  different  lengths  for  the 
monobit  DFT.  Each  point  of  the  figure  was  calculated  using  Monte 
Carlo  simulation  with  5000  independent  trials. 

The  detection  capabilities  can  change  depending  on  the  im¬ 
plementation  of  the  filter  bank  and  the  frequency  of  the  sinusoidal 
signal  to  be  detected. 

All  the  channels  of  the  monobit  DFT  have  losses  of  nearly  1 
dB  with  respect  to  the  channels  i  ■  N/ 4  (i  =  0, ...,  3)  for  sinusoidal 
signals  centred  in  a  channel.  These  filters  are  equal  to  the  ones  ob¬ 
tained  with  the  original  DFT.  A  monobit  FFT  have  different  losses 
depending  on  the  selected  filter.  For  example,  maximum  losses  of 
1.5  (2)  dB  appear  for  a  64  (256)-point  monobit  FFT. 

Additional  losses  must  be  considered  when  the  input  signal  is 
not  at  the  centre  frequency  of  the  filter.  The  impact  on  the  detection 
probability  is  not  simply  an  increase  in  the  signal  power  at  the 
input  of  the  system  to  compensate  the  attenuation  of  the  filter.  The 
reason  is  the  non-linearity  of  the  1-bit  ADC,  which  fixes  the  energy 


Fig.  5.  Average  losses  for  a  monobit  N-point  DFT  P/„  =  10  3 
and  P,i  =  0.9  for  centred  sinusoidal  signals. 


at  its  output  independently  of  the  input  energy.  As  a  consequence, 
a  signal  between  to  adjacent  filters  may  not  be  detected  for  a  fixed 
threshold  because  the  energy  is  spread  between  these  filters. 

The  simulated  detection  probabilities  for  sinusoidal  signals 
with  random  frequencies  within  the  bandwidth  of  each  filter  are 
shown  in  figures  6  and  7  for  a  P/„  =  10-f>  for  N  =  64  and 
N  =  128,  respectively.  The  dispersion  of  the  traces  is  originated 
by  the  different  amplitude  response  of  the  filters.  The  most  impor¬ 
tant  result  of  these  figures  is  the  impossibility  to  obtain  simultane¬ 
ously  a  mean  detection  probability  per  channel  of  90  %  for  a  fixed 
Pjn  =  10“ 6  by  employing  a  monobit  64-point  DFT-FFT.  How¬ 
ever,  a  monobit  128-point  DFT-FFT  can  overcome  this  drawback. 


Fig.  6.  Detection  probability  for  sinusoidal  signals  with  random 
frequencies  in  the  band  of  each  channel.  Each  trace  represents  a 
channel.  64-point  monobit  FFT  and  P/„  =  10_l>.  These  curves 
are  compared  to  the  average  response  of  sinusoidal  signals  centred 
in  different  channels  ( Centred  frequency). 


140 


Fig.  7.  Detection  probability  for  sinusoidal  signals  with  random 
frequencies  in  the  band  of  each  channel.  Each  trace  represents  a 
channel.  128-point  monobit  EFT  and  Pfa  =  10-6.  These  curves 
are  compared  to  the  average  response  of  sinusoidal  signals  centred 
in  different  channels. 


6.  DYNAMIC  RANGE 

One  of  the  most  important  requirements  for  a  channelized  receiver 
is  the  detection  of  simultaneous  signals.  Due  to  the  high  non¬ 
linearity  of  the  1  bit  ADC,  there  will  be  capture  effect  [5]  and 
reduction  of  the  instantaneous  dynamic  range  (the  ability  to  pro¬ 
cess  concurrent  signals  of  different  amplitude)  [6],  Therefore,  a 
detailed  study  of  the  dynamic  range  with  a  single  signal  and  with 
two  simultaneous  signals  (with  the  same  power  or  with  different 
power)  has  been  performed.  The  main  conclusions  can  be  summed 
up  briefly. 

Spurious  generated  by  the  non-linearity  of  the  ADC  can  be 
detected  with  a  significant  detection  probability,  see  figure  8,  al¬ 
though  it  is  possible  to  predict  the  channels  where  they  appear  once 
the  original  frequency  is  detected  and  to  avoid  them  by  blanking. 
Symbols  (+)  represents  results  obtained  with  the  experimental  set¬ 
up. 

The  probability  of  false  alarm  for  the  channels  where  neither 
the  original  frequency  nor  the  spurious  appear  decreases  with  the 
power  of  the  input  signal  due  to  the  capture  effect,  figure  8. 

For  two  signals  with  the  same  power,  the  detection  probabili¬ 
ties  for  both  signals  decrease  compared  to  the  detection  probability 
of  one  signal  with  the  same  power.  This  effect  can  be  explained 
again  by  having  in  mind  that  power  at  the  output  of  the  ADC  is 
constant  and  is  spread  among  different  channels. 

The  instantaneous  dynamic  range  is  less  than  5  dB  for  a  64- 
point  monobit  FFT-DFT  2.  This  result  varies  slightly  with  the  po¬ 
sition  of  the  signals  in  the  filter  bank.  Figure  9  shows  a  dynamic 
range  of  3  dB  for  sinusoidal  signals  in  channels  7  and  17.  If  the 
length  of  the  EFT  increases  the  instantaneous  dynamic  range  also 
increases. 

7.  CONCLUSIONS 

A  monobit  channelized  receiver  is  analysed  both  theoretically  and 
experimentally.  Simplifications  in  the  operations  of  the  FFT  allow 
to  improve  real  time  operation.  However,  one-bit  digitalization  and 

2  We  have  defined  the  instantaneous  dynamic  range  when  Pd=10  %  for 
the  weaker  signal. 


Fig.  8.  Simulated  and  experimental  results  (+)  for  the  detection 
probability  with  the  input  signal  centred  in  channel  8.  Signal  in 
channel  24  is  spurious.  64-point  monobit  FFT  using  decimation  in 
frequency,  and  P/a  =  10-3. 


Fig.  9.  Simulated  and  experimental  results  (+)  for  the  instanta¬ 
neous  dynamic  range.  SNR  for  the  signal  in  channel  17  is  5  dB 
and  the  power  for  the  signal  in  channel  7  varies.  64-point  monobit 
FFT  using  decimation  in  frequency.  Threshold  for  Pfn  =  10~3. 

simplications  in  FFT  result  in  losses  and  a  rather  low  instantaneous 
dynamic  range  regarding  the  multibit  approach. 
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ABSTRACT 

Variable  structure  multiple  model  (VSMM)  is  one  of  the  most 
powerful  algorithms  for  effectively  tracking  single  maneuvering 
target.  Although  VSMM  is  developed  specifically  to  improve 
the  interactive  multiple  model  (IMM)  method  focused  to 
reducing  computational  cost  and  improving  tracking 
performance,  it  presents  an  inherent  limitation  in  the  form  of  the 
presence  of  mode  set  jump  delay  (MJD).  In  this  paper.  MJD  as 
an  undesirable  phenomenon  in  VSMM  is  described  and 
analyzed.  In  order  to  eliminate  the  MJD,  a  neural  network  based 
VSMM  that  automatically  selects  the  optimal  mode  set  as 
achieved  by  supervised  training  is  proposed.  Through 
representative  simulations  we  show  the  proposed  algorithm 
outperforming  over  the  conventional  digraph  switching  VSMM 
in  terms  of  tracking  error. 


1.  INTRODUCTION 

Target  tracking  is  the  process  that  estimates  the  state  of  moving 
object  based  on  contaminated  measurements.  Kalman  filter  has 
been  the  most  popular  tool  for  tracking  a  moving  object  whose 
dynamics  varies  slowly.  However,  as  the  target  changes  their 
dynamics  rapidly,  the  estimation  error  increases.  To  overcome 
the  error  during  maneuvering,  various  research  efforts  in  the  past 
have  merged  essentially  to  the  following  four  approaches:  (1) 
single  filter  reactive  adaptation.  (2)  variable  dimension  filtering. 
(3)  cascaded  filtering,  and  (4)  multiple  model  (MM)  filtering  [1]. 
Among  them,  the  MM  method  is  known  to  be  most  promising 
[1].  MM  method  can  be  divided  into  two  classes:  (1)  fixed 
structure  multiple  model  (FSMM)  and  (2)  VSMM.  The  most 
representative  FSMM  method  is  IMM.  proposed  in  1980's.  In 
IMM,  the  mode  jump  is  modeled  as  the  Markov  process  and  the 
input  is  statistically  summarized  or  mixed  from  previous 
estimation  in  order  to  reduce  the  computational  load  exhibited  by 
the  GPB  algorithm  [2],  However.  IMM  has  a  drawback.  As  the 
number  of  models  increases,  the  conflict  among  models  causes 
increased  estimation  error.  On  the  other  hand,  as  the  number  of 
models  is  kept  small,  the  tracking  performance  gets  degraded 
since  not  all  target  movements  including  that  of  a  maneuver  can 
be  adequately  covered  by  a  small  mode  set  size  [3].  X.  Rong  Li. 
et  al.  [3]  proposed  the  VSMM  algorithm  to  deal  with  this 
dilemma  in  a  theoretical  approach.  However  in  VSMM.  an 


important  assumption  is  overlooked.  VSMM  presents  a  mode 
jump  based  on  Markov  process,  which  forms  a  moving  pattern. 
A  predetermined  Markov  transition  matrix  implies  a  pattern  of 
mode  jump.  When  a  target  maneuvers,  maneuvering  may  lead 
the  mode  jump  to  one.  which  is  unfamiliar  to  the  predetermined 
one  in  Markov  transition  matrix.  Due  to  the  fixed  jump  pattern, 
wrong  mode  is  selected,  which  in  turn  causes  an  estimation  error. 
As  search  proceeds,  the  tracker  eventually  finds  the  correct  mode 
but  with  additional  scans.  We  call  the  additional  scan  time  as 
‘mode  jump  delay  (MJD)’. 

In  this  paper,  the  MJD  problem  is  first  described  and  analyzed. 
We  then  propose  an  algorithm  that  reduces  the  effect  of  MJD  in 
Section  3.  Through  representative  simulations  in  Section  4,  we 
show  that  the  proposed  algorithm  outperforms  that  of  either 
Digraph  Switching  VSMM  (DSVSMM)  or  a-|3  filter. 

2.  VSMM  AND  ITS  LIMIT 

2.1  A  Simple  VSMM 

For  mode  set  matched  filtering,  discrete  dynamic  and 
measurement  equations  are  given  as 

*,+,  (Mk+]  («))=  Fkk+,xk  (Mk(j))+  v*  (MA+1  (0)  (1) 

U  (Mk  (j))=  Hk  (Mk(j))xk  (Mk(j))+  wk  (Mt  (/))  (2) 

where  xk  is  a  state  vector.  zk  is  a  measurement,  vk  is  a  process 
noise  vector,  w*  is  a  measurement  noise  vector,  Hk  is  a 
measurement  sensitive  matrix.  Mk  is  the  i-th  mode  set  in  N  mode 
sets  at  scan  k.  and  Fkk.,  is  a  state  transition  matrix  from  scan  k  to 
k+1. 

An  assumption  in  VSMM  is  that  the  mode  set  transition  is 
modeled  based  on  the  Markov  process,  whose  transition  matrix  T 
is  given  by: 

T  =  \t,JjoriJ  =  l N  (3) 

t,,  =  P{Mk+](j)\Mk{i)}  (4) 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


142 


where  the  predetermined  /v  is  the  probability  that  mode  set  i 
transfers  to  mode  set  j  after  one  scan. 

Based  on  Markov  process,  the  admissible  mode  set  [3]  is  given 
by: 


Equation  (5).  Wrong  mode  set  probability  calculation  of  each 
mode  set  causes  incorrect  selection  of  the  admissible  mode  set. 
However  as  scans  increase,  the  estimation  mode  set  approaches 
the  true  mode  set  if  the  effect  of  the  first  term  in  Equation  (7) 
becomes  more  dominant  than  that  of  the  second  term.  This  is  the 
main  essence  of  the  MJD  problem. 


M k+ 1  {A/  |  3a/ +1  ,P{Mk+l  \  Mk ,  a/+I }  >  0}  (5) 


where  M  is  a  mode  set  that  is  an  element  of  total  mode  sets. 
Based  on  those  two  mode  set  jump  assumptions,  mode  set 
matched  estimation  at  scan  k+1  is  given  as 


**+u+i  _  I  P\M t+1 0)}**+i.*+i  (Mt+i  (/')) 

(6) 


where  x  is  the  overall  state  estimate,  P{M}  is  mode  set 
probability,  and  x  is  the  mode  set  matched  estimate.  In 
VSMM  how  to  calculate  P{M}  is  the  key. 


2.2  MJD  in  VSMM 

Before  discussing  MJD  phenomenon,  consider  a  problem  in 
VSMM.  In  previous  section,  mode  jump  is  based  on  Markov 
process.  To  design  VSMM,  we  should  define  Markov  transition 
matrix.  However  we  don’t  know  the  general  rule  to  define 
Markov  transition  matrix  in  VSMM.  And  other  problem  due  to 
Markov  process  assumption  is  MJD. 

Mode  set  sequence  probability  until  scan  k+1  in  VSMM  [3,4]  is 
given  by: 


I  Z‘+1}  =  ip{rM  I  MM,Zk)pfaM  I  Af‘,Z*}r{MA  |  z*}  (7) 


where  c  is  normalization  constant,  Z*  is  the  sequence  of 
measurements  and  Af  is  the  sequence  of  the  mode  set  until 
scan  k. 

In  Equation  (7),  the  first  term,  p{zt+,  \Mk*',Zk }  is  the 
likelihood  of  mode  set  sequence  Af*1  given  zk+1.  That  is,  first 
term  is  the  updated  information  form  measurement.  The  second 
term,  P\Mk+]  \  M *  ,Zk }  is  the  mode  transition  probability  that  is 
predetermined  using  Markov  process  assumption  in  VSMM.  The 
meaning  of  second  term  is  a  priori  that  describes  how  a  mode  set 
jumps  to  another  mode  set  in  each  scan.  In  practice,  a  target  does 
not  move  in  one  predetermined  mode  jump  pattern.  As  a  result, 
Equation  (7)  leads  to  an  erroneous  mode  set  probability 
calculation  due  to  the  second  term.  The  mode  set  for  state 
estimation  is  selected  among  admissible  mode  sets  that  satisfy 


3.  NEURAL  NET  BASED  VSMM 

3.1  One  way  of  reducing  MJD 

To  reduce  the  MJD  effect,  we  develop  a  new  mode  set  selection 
method  instead  of  Equation  (7).  One  simple  way  is  that  the 
tracker  remembers  possible  mode  set  jump  patterns. 

In  order  to  describe  a  jump  pattern,  three  kinds  of  information 
are  needed. 

•  Previous  mode  set  information  (PMI). 

•  Current  measurement  information  (CMI). 

•  Current  target’s  mode  set  for  estimation. 

PMI  is  the  summary  of  the  past  information  such  as  state 
estimation  mode  set  at  previous  scan.  CMI  is  the  updated 
information  from  the  current  measurement  such  as  measurement 
at  current  scan.  CTI  is  the  information  of  target  at  current  scan, 
such  as  current  mode  set. 

Before  a  measurement  is  obtained,  PMI  is  available  and  after  we 
have  a  measurement,  CMI  is  obtained.  The  last  term  of  the  right 
side  of  Equation  (7)  implies  PMI  and  the  first  term  corresponds 
to  CMI  and  left  side  of  Equation  (7)  is  similar  to  CTI.  Therefore 
the  mode  selector  should  find  CTI  from  PMJ  and  CMI.  The 
second  term  of  left  side  of  Equation  (7)  is  not  known.  Assume  a 
function  / that  maps  {PMI,  CMI}  to  {CTI}  is  known,  and  then 
we  can  determine  the  current  CTI  of  a  target  using  /  This 
mapping  function  is  one  solution  that  reduces  MJD  effect. 
However,  since  a  target’s  movement  depends  on  non- 
deterministic  factors  like  that  of  pilot’s  will,  situation  around  the 
target,  or  some  commands  to  the  target,  /  is  not  exactly 
deterministic. 


3.2  Neural  Net  Based  VSMM  (NNVSMM) 

How  can  the  mapping  function /be  found?  The  function /is  very 
complex  and  not  expressible  in  a  formula.  In  this  case  heuristic 
method  is  very  useful.  One  of  heuristic  methods,  neural  network 
can  be  the  suboptimal  solution.  Since  backpropagation(BP) 
neural  network  can  treat  nonlinear  mapping  systems  [5],  we  can 
establish  the  mode  set  selection  logic  using  BP  net.  A  simple 
mode  set  selector  is  shown  in  Figure  1. 


PMI 

CMI 


Backpropagation 
Neural  Network 


-+•  CTI 


Figure  1  Mode  set  selection 
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4.  SIMULATION  AND  RESULT 


Moreover,  neural  net  is  an  associative  memory  [4],  which  can 
infer  the  most  probable  pattern  for  the  untrained,  hi  other  words, 
for  untrained  pattern  the  neural  net  finds  the  most  likely  pattern 
in  its  memory.  As  a  result,  neural  net  mode  set  selection  logic 
does  not  require  more  information  for  the  unknown  moving 
patterns  and  can  treat  the  unknown  by  mixing  trained  patterns. 


3.3  Implementation  of  a  Simple  NNVSMM 

Let’s  consider  2-D  single  target  tracking  without  clutter.  A 
simple  neural  net  mode  set  selector  is  shown  in  Figure  2.  A  mode 
is  determined  based  on  process  noise  variance,  that  is.  mode 
parameter  is  process  noise  variance.  As  a  result.  PMI  and  CTI  are 
process  noise  variance.  Likelihood  of  a  mode  is  used  for  CM1.  In 
this  case  neural  net  compute  measurement  noise  variance  using 
previous  measurement  noise,  and  likelihood  values. 

For  neural-net-training,  independent  multiple  Kalman  filters  with 
different  measurement  noise  variances  are  required.  From  a  train 
scenario,  we  can  obtain  true  position,  state  estimate  and 
likelihood  values  of  each  filter,  and  the  filter  whose  estimation 
error  is  the  smallest.  In  this  case  training  pair  is  constructed  by 

•  Input 

a.  Measurement  noise  var.  used  in  previous  estimation 

b.  x-directional  likelihood  of  the  Kalman  filter  whose  estimation 
error  is  smallest 

c.  y-directional  likelihood  of  the  Kalman  filter  whose  estimation 
error  is  smallest 

•  Output 

:  Measurement  noise  var.  for  current  estimation 

This  is  one  of  the  simplest  implementations  of  neural  net  based 
mode  selector.  In  this  paper  we  propose  NNVSMM.  which 
exchange  Markov  process  based  mode  set  selection  for  neural  net 
based. 


Process  noise 
variance 
at  previous  scan 


Likelihood 
:x-direction 
at  current  scan 

Likelihood 
:y-dircction 
at  current  scan 


a 


|  Backpropagntion 
network 


o- 


Process  noise 
variance 
at  current  scan 


Figure  2  Realization  of  a  simple  neural  net  based  mode 
selector 


In  simulation  the  proposed  algorithm.  NNVSMM  is  compared  to 

DSVSMM  [3]  and  a-(3  filter  [2],  Simulation  parameters  and 

environments  are  as  follows. 

•  2-D  second  order  linear  Kalman  filter. 

•  Measurement  noise  covariance  of  each  sensor  -  12m  for 
each  coordinate. 

•  Varying  process  noise  covariance  determines  the  mode. 
Mode  set  is  constructed  based  on  maneuvering  index. 

•  Range  of  process  noise  covariance:  (1.  200)  and  spacing 
between  neighbor  modes:  5. 

•  200  Monte-Carlo  runs. 

•  Simulation  Tool  -  Matlab  5.3. 

Details  on  NNVSMM  are  given  by: 

•  One  hidden  layer  with  16  neurons  for  training  scenario. 

•  For  training  data  generation.  40  by  40  Kalman  filters  are 
running  in  parallel. 

•  Simulation  scenario  is  given  in  Figure  3. 

Test  scenario  is  given  in  Figure  4. 


Figure  3  Training  Scenario 


Figure  4  Test  Scenario 
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Figure  5  RMS  errors  of  NNVSMM  and  DSVSMM 


Scan 

Maneuvering 

periods 

NN-VSMM 

DS-VSMM 

a-p  filter 

11-13 

1.5791 

4.6258 

31.0747 

21-23 

1.6138 

3.7418 

31.1780 

31-33 

1.5083 

2.9099 

30.9601 

41-43 

1.5944 

2.5968 

31.1208 

Table  1  Comparison  of  RMS  errors  in  maneuvering 
periods 


Figure  5  and  Table  1  illustrate  the  result  of  simulation.  In  Figure 
5  during  maneuvering  periods  average  error  of  NNVSMM  is 
smaller  than  that  of  DSVSMM.  Results  indicate  that  the 
proposed  NN-VSMM  reduces  the  RMS  error  by  decreasing  the 
MJD  phenomena. 

Moreover  the  RMS  error  of  NNVSMM  maintains  a  level  - 
around  1.6,  but  in  DSVSMM  RMS  error  varies  from  2.6  to  4.6. 
As  a  result  NNVSMM  has  a  stable  performance  compare  with 
DSVSMM. 

Another  strong  point  of  NNVSMM  is  that  it  can  track  the 
movement  that  is  not  trained.  In  training  scenario,  three  patterns 
are  included:  (1)  Constant  velocity,  (2)  Circular  movement,  and 

(3)  Constant  small  acceleration.  In  test  scenario,  big  acceleration 
changes  happen.  RMS  error  of  DSVSMM  during  maneuvering  is 
almost  2.6  to  4.6  times  than  during  non-maneuvering  but  in 
NNVSMM  only  up  to  1.6  times.  This  indicates  that  Markov 
process  assumption  in  VSMM  cannot  cover  all  types  of  target 
motions  and  VSMM  should  be  redesigned  whenever  the  moving 
pattern  is  changed.  However,  NNVSMM  can  interpolate  the 
unknown  moving  patterns  using  the  training  patterns.  From  this 
using  neural  network  we  can  design  VSMM  without  any 
information  on  mode  transfer  probability,  which  is  not  known  in 
general.  Therefore  if  there  is  a  training  set,  which  implies 
representative  moving  types,  then  we  can  design  NNVSMM 
more  easily  than  Markov  based  VSMM. 


5.  SUMMARY 

In  this  paper,  the  MJD  problem  in  VSMM  is  described.  As  a 
solution  to  the  MJD  problem,  a  new  mode  selection  method 
based  on  neural  net  is  developed  and  presented.  Based  on  a 
neural  net  mode  selector,  NNVSMM  that  reduces  MJD  effect  on 
VSMM  is  proposed.  Through  representative  simulations,  RMS 
error  of  NNVSMM  is  shown  less  than  those  of  DSVSMM  and  a- 
P  filter.  For  untrained  moving  patterns,  NNVSMM  is  also  shown 
achieving  better  performance  over  that  of  DSVSMM.  Another 
promising  feature  about  using  the  neural  network  based  VSMM 
is  that  it  can  be  designed  without  the  prior  information  on  the 
mode  transition  matrix  derived  from  Markov  process,  which  is 
not  known  in  general. 
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ABSTRACT 

This  paper  presents  statistical  signal  processing  approaches  for 
clutter  reduction  in  Stepped-Frequency  Ground  Penetrating  Radar 
(SF-GPR)  data.  In  particular,  we  suggest  clutter/signal  separation 
techniques  based  on  principal  and  independent  component  anal¬ 
ysis  (PCA/ICA).  The  approaches  are  successfully  evaluated  and 
compared  on  real  SF-GPR  time-series.  Field-test  data  are  acquired 
using  a  monostatic  S-band  rectangular  waveguide  antenna. 


1.  INTRODUCTION 

The  development  of  techniques  for  automated  detection  of  anti¬ 
personal  landmines  from  sensor  signal  measurements  is  a  signif¬ 
icant  problem.  This  paper  focuses  on  improving  signal-to-clutter 
ratio  for  detection  systems  based  on  ground  penetrating  radar 
(GPR)  measurements.  Clutter  is  characterized  as  signal  compo¬ 
nents  which  are  not  directly  correlated  with  primary  scattering 
from  mine  objects.  This  comprises:  measurement  noise,  distur¬ 
bances  from  the  antenna,  inhomogeneities  in  the  soil,  scattering 
from  rough  surfaces,  ground  vegetation  induced  scattering,  and  to 
some  extend  multiple  reflections.  A  number  of  recent  clutter  re¬ 
duction  approaches  suggested  in  the  literature  cover:  likelihood 
ratio  testing  [2],  parametric  system  identification  [3,  12,  15,  17], 
wavelet  packet  decomposition  [4,  7],  subspace  techniques  [8,  II, 
18,  19],  and  simple  mean  scan  subtraction  [6], 

We  focus  on  unsupervised  statistical  based  techniques  for  clut¬ 
ter  reduction;  in  particular  attenuation  of  surface  disturbances.  In 
Section  2  our  previous  suggested  principal  component  analysis  ap¬ 
proach  is  revisited.  Section  3  introduces  a  novel  approach  based 
on  independent  component  analysis.  Finally,  Section  4  provides  a 
comparative  study  on  real  GPR  field  test  measurements. 
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2.  PRINCIPAL  COMPONENT  ANALYSIS  CLUTTER 
REDUCTION 

Principal  component  techniques  have  previous  been  applied  to  GPR 
data  analysis  in  [19]  for  detection  of  mines  on  preprocessed  data 
using  cross  track-depth  scans.  In  [18]  clutter  was  reduced  by  re¬ 
constructing  from  the  most  significant  eigenvectors,  and  [8]  used 
a  generalized  singular  values  decomposition  for  separating  noise 
and  signal  spaces.  In  [11]  we  took  a  different  unsupervised  ap¬ 
proach  where  characteristics  of  the  source  signals  (principal  com¬ 
ponents)  and  associated  eigenimages  are  used  to  determine  the 
subspace  for  reconstruction. 

Let  x,j(t)  denote  the  signal  received  at  location  x  =  (i  - 
1)  cm.  y  =  (j— 1)  cm.  where  i  —  1, 2,  •  •  •  ,  /  and  j  =  1,  2,  •  •  •  ,  ,7. 
Traditional  clutter  reduction  [6]  consists  in  subtracting  the  mean 
scan  across  the  xy-plane,  f  ij(t)  =  Xij(t)  -  (/,/)" 1  ,  Xij(t). 

This  procedure  removes  the  common  signal  across  the  xy-plane, 
which  is  mainly  believed  to  originate  from  the  very  strong  air- 
to-ground  reflection.  The  approach  taken  here  is  inspired  by  ex¬ 
plorative  analysis  of  functional  neuroimages  and  multimedia  data 
[9,  13].  Define  the  P  x  TV  signal  matrix:  X  =  {A'p,;},  X,,j  — 
Xij(t),  where  the  pixel  index  p  =  i  +  (j  —  1)  •  7  £  [1;P], 
P  =  I  ■  J.  t  6  [1;  TV]  is  the  time  index  with  N  being  the  to¬ 
tal  number  of  time  samples.  Column  t  of  the  matrix  then  represent 
the  xy-plane  scan  image  at  time  t  reshaped  into  a  vector,  and  the 
signal  matrix  represents  the  sequence  of  xy-plane  images  along 
the  time  or  z-direction.  Usually  P  >  N  (in  present  experiments: 
p  =  5l2  =  2601  and  N  =  50).  Since  the  rank  of  X  is  at  most 
A7,  the  SVD  reads 

N  N 

X  =  UDVt  =Y^u,D,,,vJ,  Xp,t  =  Up,iDi,iVt,i  (1) 

i=l  1=1 

where  the  P  x  N  matrix  U  =  {Up,;}  =  ■  •  ■  ,un]  and 

the  N  x  N  matrix  V  =  =  [w i ,  -  ,«at]  represent 

the  orthonormal  basis  vectors,  i.e.,  eigenvectors  of  the  symmet¬ 
ric  matrices  XX  1  and  XT X,  respectively.  D  =  {79;,,}  is  an 
N  x  N  diagonal  matrix  of  singular  values  ranked  in  decreasing 
order,  as  shown  by  79, _ i,,_i  >  79,,,,  Vi  £  [2;  TV] .  The  SVD 
identifies  a  set  of  uncorrelated  time  sequences,  the  principal  com¬ 
ponents  (PC’s):  y ,  =  79,,,  v;,  enumerated  by  the  component  index 
i  =  1,2,..., TV  and  yt  =  [?/,(!),  •  •  •  , t/;(7V)]t.  That  is,  we  can 
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write  the  observed  signal  matrix  (image  sequence)  as  a  weighted 
sum  of  fixed  eigenvectors  (eigenimages)  u,  that  often  lend  them¬ 
selves  to  direct  interpretation:  some  will  contain  mostly  clutter, 
whereas  others  mainly  mine  reflections. 

Consider  the  projection  onto  the  subspace  spanned  by  M  se¬ 
lected  PC's  which  mainly  contain  information  about  the  mine  ob¬ 
ject,  i.e.,  Y  =  U  X,  U  =  [k.,!  ,  u,2 ,  •  •  •  ,UiM],  where  Y  is 
an  M  x  N  matrix.  The  selection  can  be  done  by  inspecting  the 
structure  of  the  eigenimage  or  by  the  time  course  of  ?/,(£).  Ide¬ 
ally,  if  yi(t)  =  8(t  —  to)  is  a  delta  function,  the  structure  of 
the  eigenimage  can  be  attributed  to  time  f0.  The  clutter  is  subse¬ 
quently  reduced  by  reconstructing  X  from  the  subspace,  as  given 
by  X  =  UY. 

3.  INDEPENDENT  COMPONENT  ANALYSIS  CLUTTER 
REDUCTION 

The  spirit  of  the  suggested  method  for  independent  component 
analysis  (ICA)  clutter  reduction  resembles  that  of  the  principal 
component  based  technique.  The  major  difference  is  that  the  sub¬ 
space  formed  by  ICA  is  not  orthogonal  as  in  PCA.  Moreover,  the 
independent  components  (IC’s),  which  are  the  counterparts  of  the 
PC’s,  are  statistically  independent.  We  thus  expect  the  IC’s  to  have 
a  more  distinct  time  localization. 

Suppose  that  X  first  is  projected  to  a  subspace  spanned  by 
eigenvectors  of  non-zero  eigenvalues,  as  we  can  not  model  from 
the  null  space  [13].  Typically  the  dimension,  d,  of  the  signal  sub¬ 
space  will  be  somewhat  smaller  than  N.  Let  U  be  the  P  x  d 
matrix  of  eigenvectors,  and  X  =  UT  X  the  projected  signal  ma¬ 
trix.  The  ICA  problem  is  defined  as:  X  —  AS  where  A  is 
the  d  x  M,  M  <  d,  matrix  of  mixing  coefficients  and  S  is  the 
M  x  N  matrix  of  IC’s  -  also  referred  to  as  source  signals.  That 
is,  the  original  signal  matrix  is  reconstructed  as  X  =  IUS  = 
ivisj,  where  W  =  U  A  is  the  matrix  of  eigenimages  and 
Si  =  Ml),  •  •  •  ,  sl(N)]T  is  the  i’th  source  signal.  The  litera¬ 
ture  provides  a  number  of  algorithms  for  estimating  A  and  S'. 
Basically  they  can  be  divided  into  two  families  in  which  the  first 
deploy  higher  (or  lower)  order  moments  of  non-Gaussian  sources, 
whereas  the  other  family  uses  the  time  correlation  of  the  source 
signals.  In  the  present  case  we  expect  that  the  sources  are  both 
non-Gaussian  and  colored.  We  deploy  a  member  from  each  fam¬ 
ily:  the  widely  used  Bell-Sejnowski  [1]  algorithm  using  natural 
gradient  learning,  and  the  Molgedey-Schuster  algorithm  [9,  16], 
They  are  both  able  to  estimate  A  and  S  up  to  a  scaling  factors  and 
permutations  of  the  source  signals. 

4.  EXPERIMENTS 

A  comparison  of  the  PCA  and  ICA  methods  for  clutter  reduction 
in  GPR  signals  were  performed  on  field-test  Stepped-Frequency 
GPR  data.  The  field-test  data  are  collected  using  a  monostatic  S- 
band  waveguide  antenna  operating  in  the  frequency  range  2.65  — 
3.95  GHz.  The  data  were  acquired  using  a  HP8753C  network  an¬ 
alyzer.  The  bandwidth  of  the  antenna  determines  the  resolution 
which  is  approx.  11.5  cm.  After  antenna  deembedding  [11]  the 
signals  were  down-mixed  to  the  base  band  in  order  to  remove  the 
carrier  [6].  The  deployed  sampling  frequency  is  5.12  GHz,  which 

‘For  a  resent  review  the  reader  is  referred  to  [14], 


corresponds  to  a  free-space  sampling  of  2.93  cm  in  the  depth  di¬ 
rection,  which  is  below  the  resolution  set  by  the  antenna  band¬ 
width. 

Iron  I  Plastic 


A  B  A  B 


Fig.  1.  Cross  section  (xt)  images.  The  mine  is  located  at  the  cen¬ 
ter  in  the  x-direction  and  at  t  =  16  (2nd  axis).  The  two  left  and 
right  columns  summarize  results  for  iron  and  plastic  mines,  respec¬ 
tively.  A  columns  correspond  to  reconstruction  from  components 
where  only  surface  reflections  are  removed,  and  B  to  reconstruc¬ 
tion  from  the  strongest  mine,  see  Figure  2.  The  rows  are:  Raw  data, 
Mean  subtraction  method,  PCA ,  Molgedey-Schuster  ICA  (MS), 
and  Bell-Sejnowski  ICA  (BS).  Raw  data  shows  only  air-to-ground 
reflection  whereas  Mean  method  helps  somewhat  in  reducing  the 
strong  surface  reflection.  PCA  seems  to  have  a  slight  improvement 
over  Mean,  but  MS  does  not  provide  much  improvement,  and  fur¬ 
ther  seems  to  enhance  multiple  reflections.  BS  on  the  other  hand 
yields  significant  improvement,  in  particular  when  reconstructing 
from  the  strongest  mine  component  only. 

In  a  measurement  area  of  51  cm  x  51  cm,  M56  mine  dum¬ 
mies2  of  iron  and  plastic  (filled  with  bees  wax)  were  buried  in  the 
center  of  the  field  in  relatively  dry  sand  5  cm  below  the  surface. 
The  resulting  signal  matrices  have  P  —  512  =  2601  and  N  =  50. 
The  signal  space  dimension  is  d  =  22  for  the  iron  mine  and  17  for 
the  plastic  mine.  Using  a  smaller  area  resulted  in  signal  matrices 
which  have  too  low  signal  space  dimension.  When  using  the  Bell- 
Sejnowski  algorithm  experiments  show  that  appropriate  learning 
rates  are  10~4  and  1CF3  for  metal  and  plastic  mine  experiments, 
respectively.  The  lag  value,  r,  for  the  Molgedey-Schuster  algo- 

2Dimensions  are:  diameter  5.4  cm,  and  height  4  cm. 


rithm  turned  out  to  be  quite  sensitive,  but  r  =  1  gave  the  best 
performance. 

In  Fig.  2  the  eigenimages  and  associated  PC's  and  IC's  are  de¬ 
picted.  ICA  algorithms  do  not  have  any  natural  ordering.  Since 
peak  locations  of  the  source  signals  determine  the  depth  of  scat¬ 
tering  objects  we  choose  to  first  rank  according  to  peak  locations 
occurring  before  the  strong  air-to-ground  reflection  at  f  =  1G. 
Next,  the  components  are  ordered  wrt.  to  variance  contribution’ 
in  the  reconstructed  signal  matrix  [10],  which  for  component  i  is 
|tu;|2  •  Var{.s,(f)}. 

The  eigenimages  of  the  iron  mine  experiments  show  nearly  all 
very  strong  mine  signatures,  however,  more  clearly  pronounced 
for  the  ICA  algorithms.  It  should  be  noticed  that  the  added  contri¬ 
bution  from  more  components  can  display  surface  like  texture.  For 
instance,  the  contributions  from  components  1  and  4  of  PCA  will 
add  to  a  more  blurred  overall  contribution.  The  source  signals  of 
PCA  and  Molgedey-Schuster  do  not  possess  good  time  localiza¬ 
tion3 4,  thus  associated  eigenimages  cannot  be  attributed  to  a  partic¬ 
ular  depth.  This  also  makes  the  selection  of  components  for  recon¬ 
struction  somewhat  unclear.  On  the  other  hand,  the  Bell-Sejnowski 
algorithm  produces  very  peaked  source  signals.  E.g.,  component 
5,  which  clearly  peaks  right  after  the  surface  reflection,  also  has 
a  strong  mine  signature  in  its  eigenimage.  In  addition,  the  width 
of  the  source  peak  is  approximately  4  samples  that  corresponds  to 
the  resolution  determined  by  the  bandwidth  of  the  antenna.  Thus, 
source  signals  which  have  peak  widths  less  than  4  samples  do  not 
make  sense.  The  results  for  the  plastic  mine  show  that  the  mine 
signature  is  much  less  pronounced,  i.e.,  signal-to-clutter  ratio  is 
low.  Component  5  has  a  strong  mine  signature  and  is  furthermore 
located  at  t.  =  18,  which  is  at  the  mine  location.  Recall  that  the 
mine  has  an  extension  of  approx.  5  cm  which  is  half  the  resolution 
set  by  the  antenna  bandwidth.  The  reconstructed  cross-section  im¬ 
ages  are  shown  in  Figure  1 . 

5.  CONCLUSION 

This  paper  provided  a  comparative  study  of  PCA  and  ICA  al¬ 
gorithms  for  clutter  reduction.  In  particular  the  Bell-Sejnowski 
ICA  showed  significant  improvement  over  PCA  and  Molgedey- 
Schuster  ICA  on  real  field  GPR  measurements.  Future  studies  will 
focus  on  methods  for  automatic  selection  of  subspace  components 
and  on  convolutive  ICA  methods. 
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Fig.  2.  Eigenimages  and  associated  source  signal,  i.e.,  PC’s  or  IC’s.  The  vertical  lines  in  the  source  signal  pictures  indicate  the  time 
corresponding  to  the  position  of  the  ground  surface.  Note  that  only  the  first  6  components  are  shown;  the  remaining  source  signals  peak  at 
later  times  and  have  smaller  variance  contributions. 
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ABSTRACT 

We  address  the  problem  of  removing  specular  ground  surface  re¬ 
flections  and  leakage/cross-talk  from  downward  looking  stepped 
frequency  ground-penetrating  radar  (GPR)  data.  A  new  model  for 
the  ground-bounce  and  the  leakage/cross-talk  is  introduced.  An 
algorithm  that  jointly  estimates  these  effects  from  collected  data 
is  presented.  The  algorithm  has  the  sound  foundation  of  a  non¬ 
linear  least  squares  (LS)  fit  to  the  presented  model.  The  minimiza¬ 
tion  is  performed  in  a  cyclic  manner  where  one  step  is  a  linear  LS 
minimization  and  the  other  step  is  a  non-linear  LS  minimization 
where  the  optimum  can  efficiently  be  found  using,  e.g.,  the  chirp- 
transform  algorithm.  The  results  after  applying  the  algorithm  to 
measured  GPR  data,  collected  at  a  U.S.  army  test  range,  are  also 
shown. 

1.  INTRODUCTION 

During  the  last  decades,  the  enormous  problems  of  buried  land¬ 
mines  have  raised  the  interest  for  subsurface  exploration  using 
ground-penetrating  radar  (GPR)[  1],  Radar  is  an  attractive  type  of 
sensor  since  it  has  the  potential  to  detect  anything  with  electromag¬ 
netic  contrast  (a  term  comprising  permittivity  and/or  conductivity 
and/or  permeability)  to  the  surrounding  medium  as  opposed  to, 
e.g.,  metal  detectors  that  can  only  detect  objects  with  sufficiently 
high  metal  content.  For  an  overview  of  some  emerging  technolo¬ 
gies  for  mine  detection,  see,  e.g..  Chapter  2  of  [2]. 

For  a  downward  looking  GPR  system,  which  has  the  geometry 
that  transfers  the  most  radiated  power  to  the  subsurface,  the  col¬ 
lected  data  is  severely  contaminated  by  specular  reflections  from 
the  ground  surface.  These  reflections,  usually  referred  to  as  the 
ground-bounce,  can  greatly  surpass  and  hide  the  weak  return  of 
a  shallowly  buried  plastic  mine.  The  ground-bounce  effect  can 
be  eliminated  by  positioning  the  antennas  in  direct  contact  with 
the  soil  so  that  no  ground-bounce  is  allowed  to  form.  For  obvi¬ 
ous  reasons,  this  is  not  a  feasible  solution  for  the  mine  detection 
application  of  GPR.  To  be  able  to  detect  a  possible  mine  in  a  reli¬ 
able  manner,  we  instead  have  to  estimate  and  remove  the  ground- 
bounce  from  the  data  without  impairing  the  return  of  the  mine  sig¬ 
nificantly. 
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Another  source  of  data  degradation  is  leakage/crosstalk  be¬ 
tween  the  antennas  and  reflections  from  different  parts  of  the  plat¬ 
form  itself:  effects  that  in  general  cannot  be  eliminated  completely 
by  judicious  system  design.  In  contrast  to  the  reflections  from  the 
ground  surface,  which  may  be  subject  to  significant  variations  in 
radar  cross  section  (RCS)  and  time  of  arrival,  the  leakage/cross¬ 
talk  effects  are  mostly  constant  with  antenna  position.  Neverthe¬ 
less.  these  effects  make  the  estimation  of  the  ground-bounce  dif¬ 
ficult.  It  has  been  proposed  that  these  leakage  effects  could  be 
estimated  by  pointing  the  platform  up  in  the  sky  and  make  a  mea¬ 
surement  without  ground  or  subsurface  reflections  present.  This 
measurement  is  then  subtracted  from  the  data  prior  to  the  ground 
bounce  removal.  However,  since  the  antennas  operate  in  their  near¬ 
field  range,  the  antenna  gain  patterns  are  affected  by  objects  and 
material  present  within  the  antenna  beams.  For  this  reason  the 
leakage/crosstalk  and  the  platform  reflections  measured  during  the 
sky-shot  can  be  quite  different  from  those  measured  when  the  an¬ 
tennas  are  placed  closely  above  the  ground  as  in  a  normal  operation 
mode. 

In  this  paper  we  introduce  a  new  model  that  takes  both  the 
leakage/cross-talk  and  the  ground-bounce  into  account.  Based 
on  this  model,  we  present  a  new  algorithm  named  DILBERT 
(an  acronym  for  Decoupled  /terative  Leakage  and  ground-Bounce 
.Estimation  and  Removal  Technique)  that  jointly  estimates  the 
leakage  and  the  ground-bounce. 

2.  DATA  MODEL 

For  a  stepped  frequency  radar  system,  a  sequence  of  sinusoids  with 
frequencies  w* ,  k  =  0, . . . ,  K  -  1,  are  transmitted  at  each  antenna 
position  n  =  0, . . . ,  N  -  1.  For  each  w*  and  n  the  amplitude 
and  phase  of  the  return  signal  (as  compared  to  the  transmitted  sig¬ 
nal)  are  recorded  as  a  complex  value  xny-.  The  propagation  time, 
r„,  associated  with  a  specific  scattering  object  then  appears  as  the 
slope  of  the  component  of  the  recorded  phase  that  is  linear  in  w*.  . 
Hence,  the  range  of  the  scattering  object  at  a  specific  antenna  po¬ 
sition  can  be  found  using  a  Fourier  transform  w.r.t.  w*..  The  scat¬ 
tering  object  will  then  show  up  in  the  range  image  as  the  Fourier 
transform  of  its  complex  RCS  with  respect  to  frequency,  centered 
at  the  time  t  =  t„. 

The  model  we  propose  for  the  collected  data  contaminated  by 
leakage  and  ground-bounce  can  be  written  as 

Xn,k  =  Vn.k  +  Ck  +  anbke3UkTn ,  (1) 

where  yu.k  is  the  contribution  from  a  possible  mine,  c*.  comprises 
the  unknown  leakage/cross-talk  and  platform  reflections  that  are 
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assumed  to  be  constant  with  platform  position,  bk  is  the  frequency 
response  of  an  unknown  reference  ground-bounce  profile,  an  al¬ 
lows  for  variations  in  the  complex  valued  ground  surface  RCS  with 
platform  position  and  rn  accounts  for  a  time  shift  of  the  reference 
ground-bounce  profile  and  hence  models  surface  height  variations. 

We  note  that  the  last  term  in  (1)  is  ambiguous  by  a  complex 
constant.  However,  since  we  are  interested  in  the  last  two  terms  as 
a  unit  (for  later  subtraction  from  the  data)  rather  than  in  the  differ¬ 
ent  parameters  themselves,  this  is  not  necessarily  a  problem.  The 
nuisance  parameters  are  only  present  to 

exploit  the  structure  of  the  problem  and  even  if  a  complex  constant 
may  move  between  the  different  parameters,  the  least  squares  fit 
that  is  presented  in  the  following  section  will  still  be  good.  To 
guarantee  that  neither  bk  nor  an  grows  infinitely  large  with  the 
other  one  vanishing  and  numerical  problems  as  a  result,  one  of 
these  parameters  needs  to  be  fixed.  In  our  case  we  choose  b0  =  1. 

The  model  above  is  an  extension  of  a  frequency  domain  equiv¬ 
alent  of  the  model  proposed  in  [3]  for  an  impulse  based  GPR  sys¬ 
tem.  The  model  in  [3],  however,  did  not  take  the  leakage/cross-talk 
term  into  account  and  it  also  assumed  that  bk  was  known. 


where  (•)*  denotes  the  complex  conjugate.  In  the  case  of  a  vector 
or  matrix  operand,  (•)*  will  also  denote  the  conjugate  transpose. 

Equations  (4)  and  (5 )  can  be  interpreted  as  a  correlation  be¬ 
tween  xnj;  and  bk  in  the  frequency  domain  where  f„  is  the  lag 
for  the  maximum  of  the  correlation  function  and  d„  is  the  com¬ 
plex  amplitude  at  that  maximum.  Hence,  the  parameter  estimates 
can  be  found  using  Fourier  transforms.  For  a  coarse  estimation  of 
rn  a  zero-padded  FFT  can  be  used.  For  an  estimate  with  higher 
resolution  a  chirp-transform  algorithm  may  be  used  locally  around 
the  coarse  estimate  to  find  the  maximum  with  higher  precision  or, 
since  local  convexity  is  likely,  some  fast  search  method  can  be  ap¬ 
plied. 

3.2.  Minimization  with  respect  to  {bk ,  Ck  } 

By  assuming  that  an  and  r„  are  known,  the  NLS  criterion  in  (2) 
reduces  to  the  linear  LS  criterion 

{{cfc}*=o'>{^  }fc=i1}  =  arg  min  ||x  -  Ab||2,  (6) 


3.  ALGORITHM 


where 


x  =  [x^  xf...  xLfec 
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With  the  assumption  that  the  antenna  main-lobe  is  narrow  or  that 
the  mine  response  is  significantly  weaker  than  the  ground  surface 
reflection,  a  LS  design  criterion  for  a  method  based  on  our  model 
is 
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which  is  non-linear  in  the  parameters  to  be  estimated  and  cannot 
be  minimized  in  closed  form. 

Our  approach  is  to  split  (or  decouple)  the  non-linear  LS  (NLS) 
criterion  in  (2)  into  two  more  tractable  minimization  problems  by 
fixing  two  different  subsets  of  the  parameters  and  then  use  a  cyclic 
algorithm  that  alternates  between  solving  the  two  new  optimiza¬ 
tion  problems. 
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where  Ik  is  an  identity  matrix  of  size  K  x  K. 

The  solution  to  (6)  can  be  found  as  (see,  e.g,  [5]) 


3.1.  Minimization  with  respect  to  {an ,  rn  } 


b  =  (A*A)_1A*x. 


(7) 


By  assuming  that  ck  and  bk  are  known  and  fixed,  the  model  in 
(1)  can  be  reduced  to  the  frequency  domain  equivalent  of  that 
proposed  in  [3]  by  subtracting  the  leakage  term  from  the  data, 
Xn.k  =  xn,k  —  Cfc .  In  [4]  the  solution  to  the  resulting  NLS  mini¬ 
mization  problem 

K  —  1 

{&n,rn}^~Q  =  argmin  V  \xn,k- anbke3UkTn\2,  (3) 

{a"’T"}iS 

is  shown  to  be 
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The  matrix  A  is  built  of  diagonal  blocks  and  zeros.  This  struc¬ 
ture  can  be  used  to  reduce  the  computational  cost  of  finding  the 
estimates.  Writing  (7)  in  terms  of  the  blocks  Ai  and  A2  gives  us 
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By  applying  the  inversion  on  the  partitioned  matrix  in  (8)  (see,  e.g., 
[6]  for  the  inversion  of  a  partitioned  matrix)  we  end  up  (after  some 
straightforward  algebra)  with  the  estimates  as 
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t  M2 


1  N~1 

Co=  -  J2  (xn,0  -a„(4"0T") 

7i=0 


(9) 


151 


{<*}£=-/ 


A7-l  N  - 1 

Xk  Y  |«».|2  -**•  E  a»fJ“‘r" 


jj  —0 


77=0 


Ar  —  1 

Y  K 

?)  =0 


12  _  J_ 

I  A" 


A’  —  1 

Y  fl^jwtTn 

77=0 


xk.N-Xk  Y  o£e  ■?a'’ 

f  f  1  A  —  1  77  —0 


Ar  — 1 

E  M2-£ 

71=0 


Ar  —  1 
n=  0 


(10) 


(11) 


where  xj.  is  the  mean  of  the  data  taken  over  platform  positions, 
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3.3.  Algorithm  summary 


The  algorithm  is  initiated  by  assuming  no  leakage  (ct  =  0)  and 
the  ground-bounce  reference  response  as  In  =  xo.k/x-o.o ■  These 
values  are  used  in  (4)  and  (5)  to  obtain  estimates  of  t„  and  a„. 
These  estimates  are  then  plugged  into  (9),  (10)  and  (1 1)  to  refine 
the  estimates  of  ck  and  l>k  .  The  procedure  is  repeated  until  prac¬ 
tical  convergence  is  achieved  (e.g.,  until  the  relative  change  in  the 
cost  function  in  (2)  is  less  than  some  prespecified  threshold,  say 
1(T5). 

Since  in  each  step  we  minimize  (2)  with  respect  to  a  subset  of 
the  parameters  and  (2)  is  bounded  from  below  (the  design  criterion 
is  positive),  convergence  is  guaranteed.  However,  we  cannot  in 
general  guarantee  that  the  convergence  is  to  the  global  minimum. 


3.4.  Reduced  DILBERT 


If  we  assume  that  no  leakage/cross-talk  is  present  (<n  =  0),  (1 1) 
reduces  to 
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The  estimates  for  r„  and  a„  are  still  found  using  (4)  and  (5)  with 
the  cosmetic  modification  that  xn.k-  =  J'n.A-  We  will  refer  to  this 
algorithm  based  on  the  “smaller”  model  as  the  reduced  DILBERT 
since  it  does  not  account  for  the  leakage  term. 


4.  RESULTS 

In  this  section  we  present  images  of  data  processed  using  the  new 
algorithms  as  compared  to  images  where  no  ground-bounce  or 
leakage/cross-talk  processing  has  been  made  and  images  where 
the  along  track  mean  has  been  removed  (i.e.,  processing  based  on  a 
constant  ground-bounce  model).  We  have  used  stepped  frequency 
data  with  a  frequency  range  from  0.5GHz  to  4GHz.  The  data  was 
provided  by  Planning  Systems  Incorporated  and  was  recorded  at  a 
U.S.  Army  test  range. 

In  the  first  example  (Fig.  1)  the  measurements  were  made 
over  a  M19  plastic  mine.  To  simulate  larger  ground  surface  varia¬ 
tions,  the  antenna  elevation  was  changed  linearly  as  the  antenna 
moved  along  the  track.  In  the  unprocessed  image  (Fig.  1(a)) 


Fig.  1.  Images  of  a  M 19  plastic  mine  buried  at  5cm  depth  and 
along  track  position  0.5m. 


we  can  see  both  the  changing  ground-bounce  and  the  more  sta¬ 
ble  leakage/cross-talk  effects.  In  Fig.  1(b)  the  latter  effects  are 
satisfactorily  removed  by  the  mean  subtraction  while  most  of  the 
ground  bounce  is  left  untouched.  For  the  reduced  DILBERT  (Fig. 
1(c))  more  of  the  ground  surface  reflection  has  been  removed. 
However,  the  algorithm  is  adversely  affected  by  the  presence  of 
the  leakage/cross-talk  and  cannot  remove  neither  of  the  degrading 
effects  satisfactorily.  As  expected,  the  DILBERT  algorithm  which 
takes  both  the  leakage  and  the  ground-bounce  terms  into  account 
(Fig.  1(d))  performs  much  better  than  the  reduced  version. 

In  the  next  example  (Fig.  2)  the  ground  surface  is  more  station¬ 
ary  with  n.  The  roll-off  observed  at  the  along  track  edges  is  a  result 
of  the  synthetic  aperture  radar  (SAR)  processing  applied  after  the 
ground-bounce  removal  in  the  last  two  examples.  Due  to  the  more 
constant  ground  surface  reflection  the  mean  removal  now  works 
much  better  than  in  the  first  example,  but  still,  large  portions  of  the 
ground-bounce  remains.  Also  the  reduced  version  of  our  model  is 
more  accurate  than  in  the  previous  example  and  consequently  the 
reduced  DILBERT  algorithm  performs  acceptably.  The  full  DIL¬ 
BERT  removes  even  more  of  the  unwanted  effects.  However,  the 
extra  degrees  of  freedom  available  also  cause  the  top  of  the  mine 
to  be  partly  included  in  the  estimate  of  the  ground-bounce  and  is 
therefore  reduced  in  intensity.  The  risk  of  including  the  mine  in  the 
ground-bounce  estimate  could  be  reduced  by  imposing  a  smooth¬ 
ness  constraint  for  the  time  of  arrival  of  the  ground  bounce.  This 
can,  e.g.,  be  done  by  replacing  the  sub-algorithm  in  Section  3. 1 
by  the  procedure  described  in  [7]  where  the  discretized  derivatives 
of  r„  are  penalized.  A  more  ad-hoc  but  computationally  more  ap¬ 
pealing  alternative  would  be  to  apply  a  window  on  (4)  centered  on 
f„_i  before  finding  the  maximum  to  get  fn. 

In  the  last  example  (Fig.  3)  we  show  results  for  data  collected 
over  a  metal  mine  where  the  mine  return  is  considerably  stronger 
than  for  the  plastic  mines  in  the  earlier  examples.  Again  we  see 
acceptable  performance  by  both  new  algorithms.  For  DILBERT, 
Fig.  3(d),  we  even  distinguish  two  dominant  scatterers.  which  we 
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Fig.  2.  Images  of  a  VS  16  plastic  mine  buried  at  2.5cm  depth  and 
along  track  position  0.75m. 


Fig.  3.  Images  of  a  M15  metal  mine  buried  at  7.5cm  depth  and 
along  track  position  0.45m. 


interpret  as  a  part  of  the  top  structure  of  the  mine. 

The  rate  of  convergence  is  very  fast  for  the  reduced  DILBERT 
(typically  less  than  5  iterations,  all  of  which  are  computationally 
very  cheap).  For  the  full  version,  the  number  of  iterations  needed 
is  usually  larger.  This  can  be  expected  as  there  are  many  more 
parameters  to  estimate.  Also,  in  the  case  of  a  ground  surface 
that  is  nearly  flat  and  with  a  surface  RCS  that  is  almost  constant 
with  antenna  position,  the  problem  of  estimating  both  the  ground- 
bounce  and  the  leakage/cross-talk  can  become  ill-conditioned  or 
even  ill-posed.  However,  during  our  data  processing  we  have  not 
encountered  any  numerical  problems  even  with  the  datasets  with 
the  smoothest  ground  surfaces.  Furthermore,  even  if  the  problem 
of  estimating  the  parameters  in  (1)  is  ill-conditioned  (or  even  sin¬ 
gular,  i.e.,  no  unique  solution  exists),  the  result  of  applying  the 
algorithm  to  remove  the  ground-bounce  and  leakage  effects  may 
still  be  satisfactory. 

For  a  running  version  of  the  full  DILBERT,  the  convergence 
speed  should  not  be  a  problem.  The  algorithm  can  be  allowed  to 
converge  by  moving  the  platform  slowly  the  first  number  of  posi¬ 
tions.  Since  the  leakage  and  the  ground-bounce  profile  are  approx¬ 
imately  constant,  and  hence  previous  values  of  &k  and  bk  provide 
good  initial  estimates,  a  few  iterations  should  be  sufficient  for  con¬ 
vergence  at  the  subsequent  antenna  positions. 

5.  CONCLUSIONS 

We  have  introduced  a  new  model  for  ground-bounce  and 
leakage/cross-talk  effects  in  stepped  frequency  ground-penetrating 
radar  data.  Based  on  the  model,  a  novel  least  squares  based  cyclic 
algorithm  (DILBERT)  for  removal  of  these  effects  has  been  pre¬ 
sented  together  with  a  reduced  version  (reduced  DILBERT)  where 
the  leakage/cross-talk  is  not  taken  into  account.  All  the  steps  in  the 
cyclic  algorithms  can  be  efficiently  solved  using  FFTs  and  simple 
vector  multiplications. 

The  results  after  the  two  algorithms  have  been  applied  to  mea¬ 


sured  GPR  data  have  been  shown.  In  the  case  of  a  ground  sur¬ 
face  changing  significantly  in  height,  the  full  DILBERT  outper¬ 
forms  the  reduced  version.  When  the  ground  surface  is  more  con¬ 
stant,  more  of  the  ground  bounce  is  eliminated  also  by  the  reduced 
DILBERT.  The  full  DILBERT  still  removes  more  of  the  ground- 
bounce  and  leakage/cross-talk  than  the  reduced  version. 

In  our  experience,  despite  the  fact  that  it  is  not  guaranteed  that 
the  global  minimum  is  always  achieved  when  solving  (2),  the  algo¬ 
rithms  provide  good  means  for  removing  the  ground-bounce  and 
the  leakage/cross-talk  effects  from  ground-penetrating  radar  data. 
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ABSTRACT 

A  pipelined  algorithm  and  parallel  architecture  is  under 
development  for  real  time  detection  of  landmines.  Our 
previous  work  has  dealt  with  monochromatic  images  from 
airborne  active  infrared  scanners  and  images  from  a  low- 
altitude  aircraft-mounted  multi-spectral  scanner.  Because 
of  the  nature  of  the  sensors  and  the  aerial  observation 
platform,  the  landmines  were  treated  as  small,  sparse, 
discrete  objects  in  a  large  clutter  field.  Our  current  work 
deals  with  passive  infrared  imagery  obtained  from 
cameras  mounted  on  ground  vehicle.  In  contrast  to  the 
previous  work,  although  the  targets  are  still  relatively 
sparse,  they  are  no  longer  small  in  the  sense  of  occupying 
just  a  few  pixels  and  the  signal  to  noise  ratio  is 
considerably  worse  than  in  for  the  airborne  active  infrared 
and  multi-spectral  scanner  problems.  So  significant 
changes  to  our  detection  algorithm  are  needed.  The  paper 
briefly  describes  the  overall  algorithm  and  the  particular 
issues,  such  as  irregular  shapes,  that  need  to  be  dealt  with 
in  FLIR  imagery.  Some  early  results  are  presented.  In 
addition,  changes  in  computer  processing  power  and  inter¬ 
processor  communications  has  led  to  a  rethink  of  the  real¬ 
time  hardware  implementations  of  the  system  and  these 
issues  are  discussed  in  the  paper. 


1.  INTRODUCTION 

For  several  years,  the  Canadian  Defense  Research 
Establishment  in  Suffield  and  the  University  of  British 
Columbia  have  been  jointly  developing  a  pipe-lined 
algorithm  and  parallel  architecture  for  real  time  detection 
of  landmines  in  images.  Minefields  are  a  serious  threat  to 
land  combat  forces  because  they  very  effectively  impede 
mobility  and  yield  a  high  degree  of  vulnerability  to 
stationary  troops.  Detecting  minefields  from  distance, 
referred  to  as  standoff  or  remote  minefield  detection,  is  a 
high  priority  among  a  number  of  nations  including 
Canada.  A  typical  minefield  image  contains  a  significant 
number  of  compact  target  objects  that  are  sparsely 


distributed  over  a  large  area  and  may  have  particular 
spatial  relationships  to  one  another. 

Since  the  research  began  on  the  minefield  detection 
algorithm,  it  has  become  apparent  that  the  algorithm  might 
have  applications  other  than  the  Airborne  Active  Infrared 
(AAIR)  monochromatic  imagery,  which  is  obtained  from 
a  low-altitude  aircraft-mounted  multi-spectral  scanner. 
For  example,  a  high  priority  project  within  the  Canadian 
Department  of  National  Defense  is  the  development  of  a 
vehicle  mounted  multiple  mine  detectors  for  use  on  roads. 
A  passive  infrared  imager,  which  produces  Forward 
Looking  Infrared  (FLIR)  images,  is  one  detector  in  this 
system.  An  automatic  target  recognition  algorithm  is 
being  developed  to  assist  the  vehicle  operator,  and  the 
Remote  Minefield  Detection  Hierarchical  algorithm, 
described  in  this  paper,  is  a  promising  candidate. 

This  pipelined  algorithm  has  been  implemented  on  a 
network  of  transputers  and  tested  using  samples  of  AAIR 
imagery.  Currently  it  has  been  adapted  for  use  on  FLIR 
images  with  success.  Furthermore,  the  advances  in 
computer  processing  power  and  inter-processor 
communications  in  recent  years  has  led  to  a 
reconsideration  of  the  system's  hardware  implementation. 

This  paper  briefly  presents  the  general  system  architecture 
of  the  Remote  Minefield  Detection  Hierarchical.  It  then 
describes  the  significant  changes  to  adapt  this  system  to 
identify  mine  objects  in  FLIR  imagery.  And  it  finally 
outlines  proposal  real-time  hardware  architecture. 

2.  REMOTE  MINEFIELD  DETECTION 
HIERARCHICAL 

2.1  Algorithm  Structure:  The  general  piped-lined 
algorithm  for  real-time  detection  of  sparse  small  objects  in 
images  can  be  described  using  Figure  1 .  The  major  parts 
of  the  algorithm  consists  of  the  Low  Level  Target  Cueing 
to  reject  non-suspect  regions  and  thus  drastically  reduce 
the  data  rate,  the  Middle  Level  Target  Shape  Analysis  to 
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classify  the  suspect  regions  as  target  or  non-target  relying 
upon  their  morphological  features,  the  High  Level  Target 
Spatial  Analysis  to  extract  the  features  of  the  spatial 
relationship  between  mine-like  objects,  and  the  Top  Level 
Knowledge  Integration  to  resolve  whether  or  not  the 
image  contain  mines  using  the  spatial  analysis  results  and 
external  information  resources. 

Raw  image  data  is  acquired  by  a  sensor  and  passed  into 
the  Image  Correction  (IC)  stage.  This  IC  module  adjusts 
the  raw  image  data  to  compensate  for  distortions, 
dropouts,  overlapping  swaths,  misregistration,  and  other 
artifacts  and  imperfections  due  to  the  scanning  process. 
After  the  IC  stage,  the  data  is  passed  to  the  Non-Suspect 
Region  Rejection  (NSRR)  stage.  The  NSRR  reduces  the 
immense  data  flow  down  to  a  stream  of  small  images 
(subimages)  that  are  likely  candidates  to  contain  mines. 
To  accomplish  this  task,  the  NSSR  collects  a  block  of 
scan  lines  of  data  image  and  then  divides  it  into  smaller 
non-overlapping  square  regions  called  subimages.  Each 
subimage  is  further  divided  into  smaller  non-overlapping 
square  regions,  whose  width  is  smaller  than  the 
dimensions  of  a  small  target  but  large  enough  to  provide  a 
stable  average  of  the  region.  The  contrast  values  of  these 
small  regions  against  the  parent  subimage  are  calculated 
and  ranked.  Only  a  few  top  values  will  be  selected  as 
suspect  regions.  Then  the  data  of  these  suspect  regions  is 
transferred  to  the  Local  Region  Thinning  (LRT)  module 
where  redundant  subimages  are  eliminated.  The  next 
stage,  Local  Region  Segmentation  (LRS),  partitions  the 
subimages  into  homogeneous  regions.  These  regions  are 
then  passed  to  the  Local  Region  Feature  Extraction 
(LRFE)  block.  In  this  step,  various  morphological 
features  such  as  pixel  area,  average  pixel  intensity  and 
region  compactness  are  measured  for  each  candidate 
mine-like  object.  The  next  stage,  Local  Region 
Classification  (LRC),  categorizes  the  assembled  feature 
vectors  into  different  classes,  to  determine  targets  or  non¬ 
targets  with  an  estimated  Likelihood  based  on  extracted 
features.  The  results  of  the  classification  are  passed  to  the 
higher  level,  Target  Spatial  Analysis.  Here,  initially  the 
relative  positions  of  likely  mines  are  analyzed  and  various 
spatial  size  and  shape  clusters  are  formed  using  a 
Clustering  algorithm.  Then,  the  clusters  are  delivered  to 
Global  Region  Feature  Extraction  (GRFE)  where  both 
statistical  measurements  and  pattern  descriptors  of  each 
cluster,  as  well  as  spatial  inter-relationship  among 
clusters,  are  computed  and  extracted.  Depending  on  the 
type  of  patterns  encountered,  either  a  statistical  or  a 
syntactic  pattern  classifier  may  be  more  appropriate. 
Scatterable  minefields  can  be  adequately  characterized  by 
a  low-dimensional  feature  space  and  hence  are  sufficiently 
handled  by  statistical  classification.  While  patterned 
minefields  are  more  amenable  to  syntactic  classification 
because  of  their  recognizable  arrangements.  However, 


since  the  patterns  are  not  known  in  advance,  both 
classifiers  are  needed  and  operated  in  parallel.  In  the  final 
level,  Knowledge  Integration,  the  expert  system  integrates 
the  statistical  and  syntactical  data  output  from  GRFE  with 
other  sources  from  the  user  and  the  knowledge  base  to 
decide  if  the  image  likely  contains  a  minefield. 


Other  Data 


Figure  1:  Remote  Minefield  Detection  Hierarchical. 

2.2  Algorithm  Modification  for  FLIR  Imagery:  The 

system  architecture  described  in  Section  2.1  was  initially 
designed  for  AAIR  airborne  imagery.  Some  modifications 
of  algorithm  and  of  hardware  structure  have  been 
researched  and  developed  to  accommodate  FLIR  imagery. 


Figure  2:  A  FLIR  image  with  visible  mines. 


FLIR  images  can  record  individual  mines,  but  the  whole 
minefield  arrangement  can  not  be  seen.  Therefore  the 
Target  Spatial  Analysis  component  is  not  needed. 
However,  the  mine  shapes  are  usually  distorted  because  of 
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the  aspect  angle  and  the  shallow  depth  of  field  of  the 
camera.  An  additional  software  component  may  be 
required  to  restore  the  image  perspective.  However,  to 
retain  the  hardware  configuration  and  to  achieve  real-time 
speed,  this  problem  might  be  corrected  more  efficiently  by 
utilizing  different  thresholds  and  reference  values  for  the 
mine  and  background  feature  vectors.  Our  study  indicated 
that  all  of  the  9  morphological  feature  components  should 
be  used  in  FLIR  case. 

Furthermore,  because  of  the  slow  speed  of  the  detection 
vehicle,  which  consequently  produces  a  low  input  data 
rate,  a  simpler  algorithm  and  hardware  implementation  for 
the  Target  Cueing  level  might  be  more  appropriate.  Also, 
the  number  of  mines  in  FLIR  image  is  small  and  the  mine¬ 
like  object  sizes  are  rather  large,  implying  that  fewer 
computations  are  likely  needed  to  identify  suspect  regions. 

2.2.1  Non-Suspect  Region  Rejection  (NSRR):  A  new 
averaging  method  was  introduced  to  better  distinguish  the 
contrast  between  objects  and  background.  This  method 
also  solved  the  problem  that  the  upper  region  of  a  FLIR 
image  is  usually  darker  the  lower  part  (Figure  2). 

2.2.2  Local  Region  Thinning  (LRT):  Because  of  the 
high  signal-to-noise  ration  in  FLIR,  more  suspected 
objects  were  selected  by  NSRR.  Thus  the  LRT  was 
optimized  to  process  the  thinning  function  more  quickly. 

2.2.3  Local  Region  Segmentation  (LRS):  The  targets  in 
FLIR  are  quite  large  comparing  to  those  in  AAIR 
imagery.  Thus  shape  analysis  is  the  key  to  success. 
Currently,  Region  Split  And  Merge  (RSAM)  algorithm  is 
used  for  segmentation  due  to  it’s  speed  and  simpleness. 

It  is  common  that  real  FLIR  images  of  mines  have  many 
tiny  dots  scattered  around  suspect  objects,  and  these  dots 
are  usually  much  darker  than  the  main  targets.  To  save 
processing  time  for  the  segmenter,  it  is  desired  to  clean  up 
this  clutter  by  using  the  mean  of  a  scan  window  as  the 
intensity  threshold  to  filter  out  all  objects  that  are  darker. 

Another  enhancement  is  that  LRS  considers  a  target  (or  a 
background)  as  a  homogenous  region  instead  of  a  multi¬ 
region  object.  This  helps  to  speed  up  the  shape  analysis 
and  reduce  the  hardware  requirements. 

As  shown  in  Figure  3,  RSAM  sometimes  does  not  wholly 
separate  the  main  target  from  its  surrounding  blots.  The 
reason  for  this  is  RSAM  was  mainly  designed  for  objects 
which  have  smaller  object  size/pixel  size  ratio,  and  objects 
should  not  connect  to  their  adjacent  pieces  by  narrow 
“bridges”,  which  unfortunately  is  not  the  case  in  FLIR. 
To  compensate  this  disadvantage,  a  Smooth  function  was 
utilized  to  blur  out  those  small  “bridges”  as  much  as 


possible,  and  to  produce  a  clean  smooth  surrounding  edge. 
The  degree  of  smoothing  has  to  be  tuned  so  that  the 
segmented  images  do  not  lose  their  characteristic  shapes 
and  still  satisfy  the  uniform  surface  requirement. 


Figure  3:  Scanned  images  of  anti-personnel  mines. 
Left:  good  image.  Right:  bad  image  with  narrow  bridges. 

2.2.4  Local  Region  Feature  Extraction  (LRFE): 

Targets  are  identified  by  classification  of  patterns  of 
morphological  features  extracted  from  the  segmented 
regions.  These  features  must  be  chosen  according  to  the 
particular  image  type.  For  the  minefield  detection 
problem  under  discussion,  it  was  decided  to  use  well 
established  morphological  quantities  to  form  the 
components  of  the  feature  vector  of  a  region,  since  these 
have  been  extensively  studied  and  are  easily  calculated. 

Study  of  the  FLIR  feature  vectors  using  On-line  Pattern 
Analysis  and  Recognition  System  (OLPARS)  software 
suggested  the  number  of  features  to  be  used  is  9.  They 
are:  the  region  area,  intensity  mean,  intensity  variance, 
maximum  intensity,  minimum  intensity,  4-adjacency 
perimeter,  4-adjacency  size,  4-adjacency  compactness, 
and  height/width  ratio. 

2.2.5  Local  Region  Classification  (LRC):  In  this  stage,  a 
classifier  will  use  a  database  of  feature  vectors  of  known 
mines  to  test  against  those  of  an  unknown  object.  The 
classifier  combines  the  feature  values  of  an  object  into  one 
or  more  values  that  will  be  compared  with  the  database. 

The  Nearest  Mean  Classifier  was  used  in  the  LRC.  Using 
the  pattern  analysis  tool  OLPARS  to  analyze  a  number  of 
feature  vectors  of  known  objects,  it  revealed  that  mine 
classes  are  clustered  and  well  separated  from  background 
classes.  Thus  misclassification  can  ideally  be  minimized. 

An  important  key  for  success  is  to  design  a  good  database 
that  LRC  will  depend  on  to  identify  targets.  A  small 
database  will  leave  many  mime  objects  unrecognized, 
while  an  abundant  database  will  cause  the  classifier  to 
select  many  clutters  or  to  be  highly  biased  (over  trained) 
on  a  certain  set  of  test  images. 

3.  HARDWARE  IMPLEMENTATION 

The  current  architecture,  conceived  in  1 989,  is  based  on  a 
distributed  network  of  transputer  computing  nodes, 
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connected  to  a  workstation.  The  low  and  mid  levels  of  the 
algorithm  are  implemented  on  the  network,  while  the  high 
and  top  levels  are  implemented  on  the  Sparc  workstation. . 
This  system  utilizes  an  array  of  vector  processors  i860s 
and  transputers  plus  local  memory  (TRAMs),  with  varying 
topologies  for  each  of  the  processing  nodes.  The  network 
is  highly  scaleable,  i.e.,  increasing  the  number  of  elements 
requires  only  minor  changes  of  the  programming  code, 
and  it  utilizes  a  simple  point-to-point  serial 
communications  protocol  for  interconnection. 

Although  the  transputer  was  an  attractive  computing 
element,  the  technology  is  now  over  15  years  old.  Despite 
of  improvements  have  been  made,  in  recent  years  the 
speed  performance  of  an  individual  transputer  has  fallen 
badly  behind  other  digital  signal  processor  (DSP)  boards, 
while  the  later  have  improved  speed,  ease  of  inter 
processor  communication  and  price. 

A  study  of  an  alternative  hardware  implementation  of  the 
Remote  Minefield  Detection  system  had  concludes  that 
the  RMD  system  performs  substantial  image  processing, 
but  does  not  appear  to  fully  utilize  the  enhanced  abilities 
of  a  digital  signal  processor.  Thus,  it  is  not  recommended 
that  a  digital  signal  processor  such  as  the  SHARC  be 
utilized.  Instead  general-purpose  processors  such  as  the 
PowerPC,  Pentium  and  Alpha  should  be  considered.  The 
generation  of  these  processors  should  be  selected  such 
that  those  that  have  SIMD  capabilities  are  utilized  as 
appropriate.  Besides  being  much  faster  at  the  image 
analysis  than  the  SHARC,  these  general-purpose 
processors  have  the  advantage  of  being  much  cheaper,  as 
well  as  having  much  lower  development  costs.  The 
SHARC  is  more  optimized  for  execution  of  fast  multiplies 
and  accumulates,  but  this  is  not  something  that  the  RMD 
system  does  much  of.  NSRR  and  LRT  are  the  two  most 
computationally  intensive  functions  that  take  a  lot  of 
hardware  to  run,  while  the  remaining  algorithms  LRA, 
RSAM  and  LRFE  scale  linearly  as  the  image  size 
increases. 

4.  PRELIMINARY  RESULTS 

The  algorithm  has  been  developed  up  to  the  expert  system 
level.  The  hardware  configuration  was  implemented  using 
a  network  of  transputers.  Non  real-time  studies  using 
simulated,  yet  realistic,  thermal  infrared  AAIR  images  and 
actual  passive  infrared  FLIR  imagery  have  demonstrated 
that  the  algorithm  is  successful  in  detecting  individual 
targets. 

In  the  recent  tests,  the  algorithm  up  to  and  including  the 
Local  Region  Classification  was  applied  to  a  number  of 
synthetic  AAIR  images,  which  consists  of  scattered 
minefields,  patterned  minefields  and  a  combination  of  the 


two.  The  probability  of  detection  of  individual  mines  was 
estimated  to  be  90%  and  the  probability  of  false  alarm  was 
2%.  For  real  FLIR  images,  the  results  are  approximately 
64.6%  probability  of  detection  and  25.6%  probability  of 
false  alarm. 

5.  CONCLUSIONS 

A  pipelined  algorithm  and  parallel  architecture  for  real¬ 
time  detection  of  mine  objects  in  monochromatic  AAIR 
imagery  and  passive  infrared  FLIR  imagery  has  been 
described.  The  algorithm  was  implemented  on  a 
distributed  computing  system,  and  all  the  modules  of  the 
algorithm  were  tested  on  simulated  and  real  passive 
infrared  minefield  images.  Non  real-time  preliminary  test 
results  indicate  the  algorithm  can  reliably  and  consistently 
detect  mines  and  minefields.  Real-time  operation  should 
be  achievable  with  a  modestly  sized,  but  special-purpose, 
parallel  and  pipelined  computer  system.  Also  modem 
computer  architecture  was  recommended  to  upgrade  the 
initially  designed  transputer  hardware  platform. 
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ABSTRACT 

Ground  Penetrating  Radar  (GPR)  has  become  widely  accepted 
as  a  major  technique  for  subsurface  investigations,  mainly  in  civil 
engineering.  Recently  considerable  efforts  are  put  in  the  devel¬ 
opment  of  GPR  systems  for  the  detection  of  shallow  buried  land¬ 
mines.  However,  GPR  performs  inadequally  due  to  clutter,  which 
dominates  the  data  and  obscures  the  mine  information.  The  clut¬ 
ter  varies  with  surface  roughness  and  soil  conditions  and  lead  to 
uncertainty  in  the  measurements.  It  is  therefore  necessary  to  over¬ 
come  these  surrounding  effects  when  processing  GPR  data  for  de¬ 
tecting  small,  shallow  buried  objects. 

In  this  paper  we  present  improved  signal  processing  tech¬ 
niques  which  can  be  used  to  reduce  the  clutter  through  data  pre¬ 
processing.  Several  approaches  are  proposed  for  GPR  clutter  re¬ 
duction  techniques,  most  of  them  model  the  clutter  statistically. 
The  proposed  clutter  reduction  technique  models  the  clutter  using 
parametric  modeling.  The  clutter  contained  in  the  measurements 
is  treated  as  an  ARMA  model. 

The  advantage  of  such  approach  lies  therein  that  once  the  clut¬ 
ter  is  satisfactorily  known,  any  target  will  show  up  as  a  small 
anomaly  in  against  the  known  clutter  background.  This  method 
suggests  that  the  clutter  shows  a  certain  amount  of  correlation. 
Experimentally  it  is  shown  that  the  dominant  interference  in  GPR 
data  is  correlated  clutter,  i.e.,  interference,  which  has  a  large  cor¬ 
relation  coefficient  for  lags  greater  than  zero.  However,  the  clutter 
environment  cannot  be  considered  completely  stationary.  An  ideal 
filter  would  then  be  an  adaptive  filter,  which  estimates  the  slowly 
varying  local  clutter  parameters,  all  the  time  ignoring  the  small  pa¬ 
rameter  jumps  caused  by  the  buried  targets  to  be  detected.  Kalman 
filtering  is  used  for  the  estimation  of  the  clutter  parameters  in  the 
presence  of  random  noise,  detects  jumps  that  occur  at  unknown 
points  in  time,  and  provides  estimates  of  the  new  parameter  val¬ 
ues,  without  altering  the  target  return. 

1  INTRODUCTION. 

The  detection  of  minimum-metal  anti-personnel  land  mines 
with  GPR  is  encounters  the  problem  of  the  extreme  clutter  envi¬ 
ronment  within  the  first  5  cm  of  the  soil  surface.  Almost  anything 
under  the  surface  of  the  ground  presents  a  return  signal,  which 
may  be  confused  with  a  valid  (lethal)  target.  Since  in  humani¬ 
tarian  demining,  it  is  mandatory  that  a  lethal  target  be  detected 
with  nearly  100  per  cent  reliability  in  any  soil  type,  the  processing 


needs  to  'clean  up'  the  GPR  data  before  any  other  tomographic  or 
recognition  algorithms  can  be  applied. 

This  paper  focuses  on  the  pre-processing  of  GPR  data,  in  or¬ 
der  to  reduce  drastically  the  influence  of  the  near-surface  clutter. 
In  order  to  estimate  this  clutter,  a  method  of  "Clutter  Parameter  Es¬ 
timation-  is  chosen.  The  advantage  of  such  approach  lies  therein 
that  once  the  clutter  is  satisfactorily  known,  any  target  will  show 
up  as  a  small  anomaly  in  against  the  known  clutter  background. 

The  second  section  discusses  the  representation  of  the  signal, 
what  exactly  is  contained  in  the  clutter,  and  the  other  signal  parts 
that  were  removed  out  of  the  signal.  Section  three  introduces  an 
ARMA  model  for  clutter  estimation  and  consecutive  reduction. 
Section  four  presents  a  method  based  on  the  Kalman  filter  to  esti¬ 
mate  the  parameters  of  the  clutter,  and  how  to  remove  it.  Section 
five  shows  the  results  of  both  methods.  Finally  the  last  section 
draws  some  conclusions. 

2  SIGNAL  REPRESENTATION. 

The  basic  model  for  the  GPR  returns  used  in  this  work  can  be 
represented  as: 

Ercc'd(k)  =  Erarl(k)  ©  ( hc(k. )  +  h,(k))  +  71  (1) 

This  represents  the  relationship  between  the  radiated  electric 
field  and  the  received  one,  where  hc{n)  and  hi  (n)  are  the  impulse 
responses  of  the  clutter  and  target,  respectively,  and  n  represents 
the  measurement  noise. 

The  removal  of  Erali{k),  the  emitted  signal,  by  deconvolution 
is  the  first  step  that  has  to  be  taken  in  the  pre-processing  of  the  data. 
This  step  can  be  performed  before  or  after  the  clutter  reduction  is 
executed.  The  only  difference  is  that  the  clutter  to  be  removed  will 
be  in  a  different  representation.  Since  our  algorithms  are  based  on 
a  learning  and  estimating  of  the  local  clutter  parameters  the  dif¬ 
ference  should  be  minimal.  Experience  showed  however  that  the 
deconvolution  process  (extensively  discussed  in  [1])  yields  better 
results  when  performed  after  clutter  reduction.  This  is  why  in  this 
paper  the  clutter  reduction  will  be  performed  directly  on  the  raw 
data. 

The  clutter  includes  many  components  as  there  are  the 
crosstalk  from  transmitter  to  receiver  antenna,  the  initial  ground 
reflection  and  the  reflections  resulting  from  non  target  scatterers 
within  the  soil.  At  this  stage,  a  target  signal  can  be  either  a  mine 
or  non-mine  object  of  minelike  size;  it  is  the  objective  of  the  post¬ 
processing  to  make  the  decision  between  the  two.  The  noise  com¬ 
ponent  specifically  refers  to  the  random  measurement  noise  which 


0-7803-701 1-2/01/S10.00  ©2001  IEEE 


158 


adds  to  the  composite  signal,  and  will  be  considered  to  be  dealt 
with  during  the  deconvolution  step. 


3  ARMA  MODEL  FOR  CLUTTER 
ESTIMATION 


In  this  section  the  simple  and  straightforward  implementation 
is  discussed. 

The  process  starts  with  a  small  amount  of  known  clutter  sam¬ 
ples.  The  samples  are  then  represented  in  a  certain  domain.  Many 
choices  of  domains  are  available,  in  a  system  identification  ap¬ 
proach  one  of  the  most  natural  choices  is  the  ARMA  model.  This 
model  estimates  the  transfer  function  H(z)  which  yields  the  sig¬ 
nal  to  be  represented,  when  excited  with  a  unit  input  function.  The 
system  described  by  H (z)  may  be  written  in  transfer  function  for¬ 
mat  as 


where 

A(z -1) 


H(z) 
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bo  +  b\ z~ 
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This  is  equivalent  with  a  discrete  Linear  Time  Invariant  system 
described  by  the  difference  equation 


is  applied  to  a  filter  which  provides  appropriate  spectral  and  am¬ 
plitude  shaping  to  represent  the  measured  noise  from  A-Scan  to 
A-Scan. 

Once  the  clutter  estimate  and  noise  processes  are  determined, 
the  target  may  be  easily  extracted  by  simply  subtracting  the  sum 
of  those  estimates  from  the  measured  signal. 


4  KALMAN  FILTER  FOR  PARAMETERS 
ESTIMATION. 

In  this  approach  the  ARMA  parameters  of  the  clutter  are  esti¬ 
mated  using  a  Kalman  filter,  where  the  parameters  are  considered 
as  being  constant  with  some  fluctuations.  In  order  to  dynamically 
take  into  account  the  presence  of  a  scatter  from  an  object,  abnor¬ 
malities  (parameter  jumps)  are  detected,  and  the  Kalman  filter  is 
restarted  using  the  previously  estimated  parameters.  The  proposed 
approach  is  based  on  the  ideas  suggested  in  [2] 

The  Kalman  filter  is  based  on  the  following  equations  [4]: 
the  system  equation 

A'fr+i  =  FkXk  +  BkUk  +  Wk  (7) 

and  the  observation  equation 


y{n)  +  oi y{n  -  1)  +  a2y{n  -  2)  +  . . .  +  an<1y(n  -  na)  = 
bau(n)  +  biu(n  —  1)  H - h  bnbu(n  —  rib)  (4) 

where  y(n)  is  the  output  sequence  and  u(n)  is  the  input  sequence, 
and  na,nt,  are  the  order  of  the  output  and  input  processes,  respec¬ 
tively.  For  causality,  rib  <  na. 

The  vector  of  coefficients  [a,,  6;]  is  the  ARMA  representation 
of  the  original  signal.  The  parameters  of  the  target  may  be  esti¬ 
mated  only  after  the  clutter/noise  parameters  are  determined  and 
removed. 

The  clutter  model  may  be  represented  by 

ec  —  [oi ,  02  j  *  * .  dna  j  bo ,  b\ ,  •  *  •  bnb  ]  (5) 

and  its  block  diagram  is  shown  in  here: 


8(n) 


— *  sc[n) 


Representation  of  Clutter  Parametric  Model. 


The  input  to  the  clutter  block  diagram  is  a  (5-function  under 
the  hypothesis  of  an  ideally  deconvolved  emitted  signal. 

When  modeling  real  systems,  it  is  usually  necessary  to  include 
a  noise  model  which  will  account  for  any  random  disturbances 
caused  by  the  measurement  equipment,  etc.  The  inclusion  of  a 
(possibly  time-varying)  noise  model  is  therefore  indicated. 

Such  a  noise  process  model  can  be  described  by 


Zk  =  HkXk  +  14  (8) 

where  X  is  the  state  vector,  F  the  state  transition  matrix,  U  the 
input,  H  the  observation  matrix,  V  some  Gaussian  noise  process 
and  Z  the  measured  output. 

In  this  model  the  ARMA  representation  is  used  so  that: 


Hk  —  [—Zk- 1  —  Zk- 2  —  Zk-n  £4-1  £4-2  Uk-n\ 

(9) 

and 

Xk  =  [ai  a-2  ■■■  an  bi  b2  ■  ■■  bn]  (10) 

In  our  application  the  input  can  be  considered  to  be  zero,  so 
that  the  equations  are  simplified: 

-14+1  =  Xk  +  Wk  (11) 

where  Wk  is  the  process  noise. 

The  Kalman  filter  algorithm  is  given  by  the  following  equa¬ 
tions: 


Kk  =  Pk  Hk  ( HkPkHl  +  R)-1 

(12) 

Xk  =  Xk  +  Kk(Zk  -  HkXk) 

(13) 

Pk  =  (J  -  KkHk)Pk 

(14) 

<4  =  [gU92,---gng,fo,fl,---fnf]T  (6) 

The  input  to  the  noise  model  is  a  vector  of  independent,  iden¬ 
tically  distributed  (i.i.d.)  samples  with  a  Gaussian  amplitude  dis¬ 
tribution  of  zero  mean  and  constant  spectral  intensity.  The  input 


Where  Pk  stands  for  Pk/k-i  and  R  represents  the  measure¬ 
ment  noise  covariance  matrix. 

This  standard  Kalman  filter  algorithm  allows  the  estimation  of 
the  parameters  in  the  presence  of  noise.  In  order  to  detect  when 
the  parameters  deviate  abruptly  from  their  nominal  value,  due  to 
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a  scattering  from  a  target,  and  readjust  the  filter  gain  so  as  to  pro¬ 
duce  new  estimates  of  the  changed  parameters,  a  hypothesis  testing 
is  applied  to  the  normalized  residual  (Zi,  —  iff ■  A'jT ) 

Let  Q::k  —  E[Zi-Z[]  the  covariance  of  Zk.  The  normalised 
residual  is  given  by: 


Znk  = 


zk  -  fh  *  x; 

s/QTk 


(15) 


with 


Q:k  =  Hk*Pk  *Hk  +  R  (16) 

In  this  case  Z„k  is  a  zero  mean  gaussian  random  variable.  The 
comparison  of  Z„k  with  a  threshold  suggests  whether  or  not  the 
signal  parameters  have  effected  a  jump.  On  the  basis  of  this  the 
filter  parameters  are  adjusted  accordingly. 

5  EXPERIMENTAL  RESULTS  AND 
DISCUSSION 

The  data  used  in  this  section  were  taken  from  two  sources. 
The  first  was  acquired  at  TUI  1 ,  the  second  at  RMA  2.  The  data 
from  the  TUI  is  represented  by  a  Bscan,  acquired  while  scanning 
over  a  number  of  buried  objects.  The  horizontal  axis  represents 
scanned  distance  with  a  scanning  step  of  2  cm,  while  the  vertical 
axis  shows  measured  time  samples.  The  raw  data  used  here  are  the 
data  after  subtraction  of  the  measured  cross  talk.  The  data  from 
RMA  represent  a  Bscan  over  one  buried  object  with  a  scanning 
step  of  1  cm. 


(a)  (b) 


Figure  2:  The  results  of  the  first  method  on  one  of  the  channels  of 
the  array  data,  (a)  Raw  Data,  (b)  after  processing,  (c) 
representative  Ascans  from  both  (.-Raw  and  -Processed) 


5.1  ARMA  model  for  clutter  estimation. 


Here  we  show  some  results  for  the  method  described  in  3. 


(a)  (b) 

Figure  1 :  The  results  of  the  Dynamic  method  on  the  bistatic  data, 
(a)  Raw  Data,  (b)  after  processing 


In  the  figures  1  and  2  the  results  for  the  dynamic  method  are 
shown  for  the  bistatic  data  resp.  the  array  data.  As  it  can  be  seen, 
comparing  the  raw  data  bscans  with  the  processed  ones,  the  clutter 
both  above  and  below  the  signals  are  reduced.  The  remaining  sig¬ 
nals  are  however  somewhat  distorted  as  can  be  seen  in  figure  2.c. 
Here  the  same  representative  Ascan  is  extracted  from  the  raw  and 
processed  Bscans,  and  plotted  on  the  same  figure(The  line  with  .- 
represents  the  raw  data  and  the  full  line  the  processed  data).  It 
shows  that  the  clutter,  especially  above  the  signal  is  reduced,  but 
that  the  signal  itself  has  undergone  some  distorting. 


5.2  The  Kalman  Filter 


Here  we  show  some  results  for  the  method  described  in  4. 


'The  Technische  Universitat  Ilmenau  has  a  setup  that  simulates  an  array  of  6  emitting  and  receiving  antennae.  The  data  is  acquired  in  the  frequency 
domain,  between  1  and  6  GHz. 

2The  Royal  Military  Academy  has  an  ultra  wideband  system  with  one  emitting  and  one  receiving  antenna.  For  a  more  detailed  description  of  the  system, 
refer  to  [3] 
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(a)  (b) 

Figure  3:  The  results  of  the  Kalman  method  on  the  bistatic  data, 
(a)  Raw  Data,  (b)  after  processing 


Figure  4:  The  results  of  the  Kalman  method  on  one  of  the 
channels  of  the  array  data,  (a)  Raw  Data,  (b)  after  processing,  (c) 
representative  Ascans  from  both  (.-Raw  and  -Processed) 


In  figures  3  and  4,  The  proposed  Kalman  Processing  was  per¬ 
formed  on  the  representative  Bscans.  As  explained  in  section  4, 
the  proposed  method  is  a  1  dimensional  one.  The  clutter  above 
and  below  the  signals  is  almost  completely  nullified.  In  figure  3 
some  clutter  remains  above  the  signal,  due  to  the  fact  that  the  raw 
data  still  contained  the  antenna  crosstalk.  Figure  4.c  shows  again 
the  comparison  between  two  representative  Ascans  (The  fine  with 
represents  the  raw  data  and  the  full  line  the  processed  data).  Here 
it  is  clear  that  the  clutter  above  and  below  the  signal  is  eliminated, 
while  preserving  the  shape  of  the  original  signal  quite  accurately. 


6  CONCLUSION 

Two  methods  were  theoretically  introduced  to  reduce  the  2D 
clutter  in  GPR  Bscan  images.  They  were  applied  to  data  origi¬ 
nating  from  two  different  types  of  GPR.  The  Kalman  method  was 
found  to  give  the  better  results,  reducing  most  of  the  clutter  to  zero, 
while  preserving  the  shape  of  the  original  signal. 
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ABSTRACT 

Traditionally,  electromagnetic  induction  (EMI)  sensors  are 
operated  in  the  time-domain  and  the  response  strength  is  related 
to  the  amount  of  metal  present  in  the  object.  These  sensors  have 
been  used  almost  exclusively  for  landmine  detection. 
Unfortunately,  there  is  often  a  significant  amount  of  metallic 
clutter  in  the  environment  that  also  induces  an  EMI  response. 
Consequently,  EMI  sensors  employing  detection  algorithms 
based  solely  on  metal  content  suffer  from  large  false  alarm  rates. 
A  second  issue  regarding  processing  of  data  collected  on  highly 
cluttered  sites  is  that  anomalies  are  often  in  close  proximity,  and 
the  measured  EMI  signal  consists  of  a  weighted  sum  of 
responses  from  each  anomaly.  To  mitigate  the  false  alarm 
problem,  statistical  algorithms  have  been  developed  which 
exploit  models  of  the  underlying  physics  for  mines  with 
substantial  metal  content.  In  such  models  it  is  commonly 
assumed  that  the  soil  has  a  negligible  effect  on  the  sensor 
response,  thus  the  object  is  modeled  in  “free  space”.  To  date, 
such  advanced  algorithms  have  not  been  applied  specifically  to 
the  problem  of  detecting  of  low-metal  mines  in  a  cluttered 
environment.  Addressing  this  problem  requires  considering  the 
effects  of  soil  on  signatures,  separating  the  multiple  signatures 
constituting  the  measured  EMI  response  as  well  as 
discriminating  between  landmine  signatures  and  clutter 
signatures.  In  this  paper,  we  consider  statistically  based 
approaches  to  the  landmine  detection  and  classification  problem 
for  frequency-domain  EMI  sensors.  We  also  develop  a 
preliminary  statistical  approach  based  on  independent 
components  analysis  (ICA)  for  separating  the  signals  of  multiple 
objects  that  are  within  the  field  of  view  of  the  sensor  and 
illustrate  the  performance  of  this  approach  on  measured  data. 

1.  INTRODUCTION 

Land  mines  and  unexploded  ordnance  (UXO)  present  a 
significant  threat  to  individuals  around  the  world.  Currently 
deployed  methods  of  clearing  subsurface  threat  items  are  slow 
and  less  than  100%  accurate.  A  variety  of  sensors  for  landmine 
and  UXO  detection  have  been  proposed  and  utilized,  each  of 
which  exploits  a  different  fundamental  phenomenology.  The 
most  commonly  deployed  sensor  is  an  electromagnetic  induction 
(EMI)  sensor  that  operates  by  detecting  the  metal  present  in  land 
mines.  The  high  level  of  risk  associated  with  the  landmine 
detection  problem  requires  100%  detection  performance  for  any 
viable  sensor  for  all  possible  targets.  However,  there  are 
hundreds  of  varieties  of  land  mines  that  vary  in  their  construction 
from  metal-cased  varieties  with  a  large  mass  of  metal  to  plastic- 
cased  varieties  with  very  small  amounts  of  metal.  In  addition, 
there  is  often  a  significant  amount  of  metallic  debris  (clutter) 
present  in  the  environment.  Consequently,  EMI  sensors  that 
utilize  traditional  detection  algorithms  based  solely  on  the  metal 


content  operating  at  a  high  enough  detection  rate  to  satisfy 
performance  requirements  suffer  from  very  high  false  alarm  rates. 

To  address  the  false  alarm  issue,  several  groups  have  investigated 
target  identification,  or  discrimination,  using  EMI  sensors,  while 
other  groups  have  considered  alternative  sensor  modalities  and 
sensor  fusion  [1-8].  In  [1,7],  classification  of  metal  targets  using 
frequency-domain  EMI  sensors  is  considered,  and  a  Bayesian 
approach  is  applied  to  address  the  inherent  uncertainties 
concerning  the  target/sensor  orientation.  In  [2],  this  same  sensor 
is  utilized  to  investigate  the  frequency-domain  signatures  of 
landmines.  In  the  work  presented  here,  a  statistically-motivated 
approach  is  evaluated  in  a  blind  field  trial,  which  is  a  better 
method  of  evaluating  robustness  than  evaluating  algorithms 
solely  on  synthesized,  laboratory,  or  fully  ground-truthed  data. 

Some  statistical  approaches  to  this  problem  may  be  ineffective 
since  a  statistical  model  is  needed  to  describe  the  null  hypothesis, 
and  there  is  often  insufficient  data  available  to  adequately 
develop  an  accurate  statistical  model  [6],  Although  the  response 
to  the  ground  can  be  characterized  statistically,  it  is  very  likely 
that  it  is  inappropriate  to  model  discrete  clutter  in  the  same 
manner.  Recent  results  from  the  signal  processing  community 
have  indicated  that  an  adaptive  coherence  detector  [10]  has 
optimality  properties  under  a  set  of  conditions  that  are  applicable 
to  the  landmine  detection  problem  considered  here.  Specifically, 
when  detecting  a  signal  of  known  form  but  unknown  amplitude 
in  zero-mean  Gaussian  noise  with  a  known  covariance  structure 
but  unknown  gain,  a  correlation  coefficient,  or  cosine  statistic, 
provides  a  uniformly  most  powerful  invariant  test  statistic  [9,10], 
Since  the  subspace  algorithm  is  predicated  on  an  assumption  that 
the  null  hypothesis  follows  a  zero  mean  Gaussian  distribution, 
this  particular  detector  is  not  directly  applicable  to  the  problem 
of  landmine  detection  in  the  presence  of  anthropic  clutter. 
However,  the  invariance  class  associated  with  this  algorithm  does 
address  some  of  the  issues  associated  with  the  detection  problem 
and  the  sensor  that  we  are  considering. 

In  this  paper,  performance  of  an  algorithm  based  on  a  subspace 
detector  was  evaluated  on  data  collected  in  a  blind  field  test.  We 
have  modified  the  original  formulation  of  the  detector  to  consider 
both  a  multi-component  alternative  hypothesis  and  a  null 
hypothesis  that  does  not  follow  a  zero-mean  multivariate  normal 
model.  We  developed  a  set  of  features  that  could  be  used  to 
robustly  differentiate  between  landmines  and  discrete  clutter  and 
that  could  also  be  extracted  from  the  EMI  data  in  real  time. 

2.  SENSOR  AND  DATA 

The  GEM-3  is  a  prototype  wide-band  frequency-domain  EMI 
sensor  manufactured  by  Geophex.  Ltd.  The  GEM-3  uses  a  pair  of 
concentric,  circular  coils  to  transmit  a  continuous,  wideband, 
electromagnetic  waveform  [8],  The  resulting  field  induces  a 
secondary  current  in  the  earth  as  well  as  in  any  buried  objects. 
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The  set  of  two  transmitter  coils  has  been  designed  so  that  they 
create  a  magnetic  cavity  at  the  center  of  the  two  coils.  A  third 
receiving  coil  is  placed  within  the  magnetic  cavity  so  that  it 
senses  only  the  weak  secondary  field  returned  from  the  earth  and 
buried  objects.  The  frequency-domain  in-phase  and  quadrature 
components  are  obtained  from  the  received  signal  by  convolving 
the  received  time-series  with  a  sine  time-series  (for  in-phase)  and 
cosine  time-series  (for  quadrature)  at  the  frequency  of  interest. 


optimality  properties.  The  invariance  to  overall  data  scaling  as 
well  as  the  optimality  when  testing  and  training  sets  have  a 
different  scaling  factor  applied  to  the  covariance  matrix  is  one  of 
the  most  appealing  properties  of  this  set  of  detectors  for  landmine 
detection  problem.  As  the  authors  state,  what  these  detectors 
sacrifice  in  high-SNR  performance  they  gain  in  robustness  to 
uncertain  and  changeable  prior  information  regarding  the  signal 
and  noise  model  and  statistics. 


When  a  mine  is  present,  the  response  of  the  sensor  consists  of  the 
sum  of  a  response  due  to  the  mine  and  a  response  due  to  the 
background.  To  obtain  the  response  due  to  the  mine  alone,  it  is 
necessary  to  determine  the  response  of  the  sensor  to  the 
background  alone.  The  GEM-3  sensor  has  some  level  of  thermal 
drift  in  its  background  response  [6],  so  the  sensor  noise  cannot 
be  treated  with  a  simple  statistical  model  (e.g.  zero  mean 
Gaussian).  This  drift  must  be  tracked  so  that  the  background 
response  can  be  removed  from  the  measured  signature. 

Details  of  the  data  collection  plan  can  be  found  in  [11],  however 
the  most  salient  points  are  summarized  here.  A  50  meter  by  20 
meter  plot  of  ground  was  selected  for  construction  of  the  test 
grid,  and  a  calibration  area  was  created  in  an  adjacent  5  meter  by 
25  meter  plot.  Initially,  all  indigenous  clutter  was  removed  from 
the  site.  Mine  targets  emplaced  in  the  test  grids  were 
predominately  “low  metal”  mines  since  these  are  the  most 
challenging  targets  to  detect  using  EMI  sensors.  Samples  of  the 
indigenous  clutter  were  re-emplaced  in  the  grids  to  provide 
discrete  opportunities  for  false  alarms.  The  ground  truth 
associated  with  the  calibration  area  is  available;  however,  the 
ground  truth  associated  with  the  main,  or  blind,  test  grid  is 
sequestered.  Algorithm  developers  provide  the  output  of  their 
algorithms  for  each  grid  square  or  “decision  opportunity”  to 
JUXOCO  for  scoring.  The  GEM-3  EMI  sensor  was  programmed 
to  measure  responses  at  20  frequencies  spaced  logarithmically 
between  270  and  23,790  Hz.  Ten  spatial  positions  in  a  '+’ 
pattern  were  measured  in  each  grid  point,  since  spatial 
information  has  been  shown  to  improved  detection  and 
discrimination  performance  [1],  Samples  were  taken  every  2”. 

In  order  to  track  the  background  signature,  background 
measurements  were  taken  at  one  of  four  grid  squares  that  had 
been  set  aside  as  known  “blanks”  by  JUXOCO.  The  closest 
“blank”  square  was  used  as  the  background  for  each  grid  square 
while  the  measurements  were  taken  in  the  lanes.  A  background 
measurement  was  taken  before  and  after  signature  data  was 
collected  in  each  grid  square.  In  order  to  compensate  for  sensor 
drift,  a  linear  prediction  algorithm  was  used  to  predict  the 
background  signature  during  each  of  the  10  measurements  made 
in  the  grid  square  using  the  background  data  measured  before 
and  after  data  was  collected  in  each  square.  These  “corrected” 
data  were  the  input  to  the  algorithm  described  in  the  next  section. 

3.  ALGORITHM  DEVELOPMENT 

Kraut,  Sharf  et  al.  have  used  the  theory  of  generalized  likelihood 
ratio  tests  to  show  that  previously  proposed  matched  subspace 
detectors  can  be  employed  in  the  case  of  unknown  noise 
covariance  and  unknown  scaling  of  a  mean  vector  [9,10],  They 
have  described  what  they  term  “adaptive”  matched  subspace 
detectors,  where  adaptive  implies  an  estimation  of  the  covariance 
structure  using  training  data,  and  have  investigated  their 


The  general  detection  problem  addressed  by  Sharf  and  his 
colleagues  is  one  in  which  a  A-element  signal,  s,  is  located  in  a 
signal  subspace  of  dimension  p.  The  signal  is  scaled  by  an 
unknown  constant  k  and  scaled  noise,  gw  is  added  to  the  signal 
where,  the  scaling  factor  g  is  unknown.  The  measured  signal,  r, 
is  given  by  r  =  ks  +  gw  which  follows  a  complex  normal  (CN) 
distribution  r  -  CNN(ks,g2l.)  ■  This  formulation  admits  a 

solution  ranging  in  complexity  from  a  rank  1  matched  filter  (p=  1) 
to  rank  p  subspace  detectors  where  s  =  T  0  ,  'P  denotes  the 
signal  subspace  and#  is  the  known  or  unknown  parameter  vector 
which  locates  the  signal  in  the  subspace.  Under  the  signal 
hypothesis,  Hlt  k  &  0  while  under  the  null  hypothesis,  Hq, 
k  =  0 .  Previous  work  has  addressed  this  detection  problem 
under  various  assumptions  regarding  the  parameters 
'F,X,g2,  and  0  ■ 


For  the  mine  detection  problem,  we  know  s,  but  do  not  know  k, 
g,  or  X .  It  was  shown  in  [9]  that  the  optimal  test  under  these 
uncertainties  is  given  by  a  cosine-squared  statistic 
,  (i^ir's)2 

COS-  =  - y. 

(rrZ~  r)(srX  s) 


and  is  termed  a  coherent  CFAR  adaptive  subspace  detector 
(ASD).  In  the  formulation,  X  is  estimated  using  maximum 
likelihood  techniques  and  then  used  in  the  formulation  of  the 
detector.  This  approach  is  generally  associated  with  a 
generalized  likelihood  ratio  test,  however  it  was  shown  in  [9]  that 
this  detector  is  in  fact  optimal  and  uniformly  most  powerful. 


To  apply  the  subspace  method  described  above  we  first  consider 
only  the  data  measured  at  the  center  point  of  each  grid  sqare,  and 
if  we  assume  the  data  follows  the  model 


H,  :rt~  Niks^gl.) 


H0:r,~N(0,gZ) 

where  the  subscript  denotes  the  ilh  target  and  k  and  g  are  arbitrary 
scaling  factors  on  the  mean  and  covariance  matrix.  For  the  mine 
detection  problem  considered  here,  the  phenomenology  inherent 
in  the  physical  problem  allows  us  to  interpret  the  scalar  k  as 
uncertainty  in  object  depth  and  the  scalar  g  as  uncertainty  in  the 
strength  of  the  noise  process  resulting  from  the  thermal  drift. 
Under  these  assumptions,  the  detector  for  each  spatial  position  is 
given  by 


A  = 


(r'i-'rXs^ir's,.) 


This  detector  would  be  appropriate  for  finding  a  single  mine  type 
at  the  center  position  in  a  zero  mean  background.  To  extend  the 
formulation  to  consider  N  possible  mine  types  at  the  center 
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position  in  zero  mean  background  (corresponding  to  prior 
knowledge  regarding  the  form  of  6  )  a  bank  of  such  detectors  is 
used  and  the  maximum  output  is  selected: 

p  =  mfx  /3 

This  formulation  also  allows  classification  of  the  mine  by  type. 


When  clutter  is  encountered,  the  zero  mean  assumption  on  the 
measured  signal  under  the  null  hypothesis  is  clearly  invalid. 
There  are  two  mechanisms  by  which  to  modify  the  detector 
described  above  to  consider  the  case  of  discrete  clutter.  In  the 
first,  we  considered  an  approach  described  in  [10]  wherein 
clutter  is  assumed  to  lie  in  a  rank  t  “interference”  subspace  and  is 
thus  treated  separately  from  the  noise  or  background.  Although 
this  is  probably  an  accurate  model,  in  the  application  we  are 
considering  there  was  not  enough  data  to  accurately  model  the 
subspace.  For  example,  for  the  M=20  clutter  items  contained  in 
the  data  set,  it  is  not  usually  possible  to  write  one  of  the  clutter 
signals  as  a  linear  combination  of  the  other  M-l  signals,  i.e.  the 
measured  signals  do  not  span  the  clutter  subspace. 


The  second  approach,  which  was  adopted  here,  is  to  utilize  the 
output  of  the  bank  of  cosine-filters  as  a  feature  set  and  to  develop 
statistics  for  that  feature  set  under  the  two  hypotheses.  These 
statistics  can  then  be  used  in  the  formulation  of  a  likelihood  ratio. 
Let  t  be  the  11  x  1  data  vector  formed  by  concatenating  the 
sorted  f}.  then  the  likelihood  ratio  for  this  data  set  is 


To  complete  the  formulation  of  this  detector,  the  pdfs  of  t  under 
each  hypothesis  must  be  estimated.  These  estimates  were 
obtained  using  the  calibration  data  collected  in  conjunction  with 
the  JUXOCO  experiment.  A  uniform  distribution  with 
independent  components  was  used  to  model  f(t/H,)  and  a 
Gaussian  distribution  (both  the  mean  and  the  covariance 
structure  were  estimated)  was  used  to  model  f(t/H0).  A  simple 
extension  of  this  approach  allows  the  incorporation  of  spatial 
data. 


4.  RESULTS 


In  Figure  1,  ROC  curves  are  presented  for  three  algorithms.  The 
performance  of  the  baseline  algorithm  is  shown  with  a  solid  line 
and  represents  the  performance  obtained  when  EMI  detectors  are 
operated  in  a  traditional  energy-detection  mode.  The  energy  of 
the  signal  measured  at  the  center  of  each  grid  square  was 
reported  to  JUXOCO  for  scoring.  Clearly,  energy,  which  is 
proportional  to  metal  content  and  inversely  proportional  to 
distance  between  the  object  and  the  sensor,  does  not  provide  a 
good  discriminator  between  landmines  and  clutter  or  ground  at 
this  site.  The  performance  of  the  cosine  detector  operating  on  the 
signal  measured  at  the  center  of  each  grid  square  is  shown  with 
the  dashed  line.  This  algorithm  is  the  coherent  CFAR  ASD 
described  in  [9]  modified  to  utilize  the  maximum  statistic  over 
each  of  the  signals,  Sj.  Also  shown  is  the  performance  of  the 
modified  algorithm  that  utilizes  the  output  of  the  coherent  CFAR 
ASD  for  each  potential  mine  signal  as  the  input  to  the  likelihood 
ratio.  It  is  labeled  with  the  notation  ‘clutter’  to  indicate  that  for 
this  formulation  the  statistics  of  the  discrete  clutter  objects  are 


included  in  the  formulation,  which  is  not  true  for  the  standard 
coherent  CFAR  ASD.  Including  a  clutter  model  improves  the 
performance  of  this  algorithm,  and  both  approaches  perform 
dramatically  better  than  the  baseline. 


Figure  1.  ROCs  for  the  energy  detector,  CFAR  ASD  detector 
and  the  modified  CFAR  ASI)  detector  on  center  point  data. 

In  Figure  2,  a  similar  set  of  curves  is  provided;  however,  in  this 
figure  the  spatial  information  has  been  incorporated  into  the  two 
CFAR  ASD  formulations.  Again,  including  a  clutter  model 
improves  the  performance  of  this  algorithm,  and  both  approaches 
perform  dramatically  better  than  the  baseline  algorithm.  In  the 
lower  false  alarm  rate  region  there  is  little  difference  between  the 
performance  of  the  two  algorithms.  This  may  be  a  result  of  the 
fact  that  spatial  information  for  the  low-metal  mines  falls  off  into 
the  noise  floor  faster  than  for  the  larger  metal  mines.  In  the  high 
Pd  range,  the  modified  ASD  detector  performs  better  than  the 
standard  ASD  detector. 


Figure  1.  ROCs  for  the  energy  detector,  CFAR  ASD  detector 
and  the  modified  CFAR  ASD  detector  on  center  point  data. 

As  mentioned  previously,  it  is  possible  to  obtain  classification 
information  as  the  detector  is  implemented  as  a  bank  of 
processors,  each  tuned  to  a  particular  mine.  We  utilized  the 
processor  which  has  the  maximum  output  to  name  the  mine  and 
this  information  was  sent  to  JUXOCO  for  scoring.  Using  this 
approach,  the  mines  were  “named"  correctly  65%  of  the  time. 
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5.  SIGNAL  SEPARATION 

One  issue  not  considered  in  the  above  development  is  the  case  of 
overlapping  signals.  In  actual  field  environments,  objects  are 
often  in  close  proximity  and  thus  the  measured  response  will 
consist  of  a  combination  of  the  responses  from  the  individual 
objects.  For  algorithms  such  as  those  described  above,  these 
signals  must  be  separated  prior  to  their  application.  Independent 
Components  Analysis  (ICA)  has  also  been  proposed  as  a  viable 
solution  for  the  problem  of  blind  source  separation  [12-15], 
although  it  has  not  been  explored  in  the  subsurface  sensing 
application.  Therefore,  an  ICA  algorithm  has  been  implemented 
to  test  the  feasibility  of  this  approach  for  the  problem  of 
separating  two  landmine  signatures  from  a  set  of  measured 
responses. 

We  considered  the  signature  of  two  ordnance  items,  where  in- 
phase  (solid  black)  and  quadrature  (dashed  black)  measured  data 
as  a  function  of  frequency  for  the  objects  in  isolation  are  shown 
in  the  top  two  panels  of  Figure  3.  These  signals  were  then 
“mixed”,  and  the  two  of  the  resultant  signals  are  plotted  in  the 
middle  two  panels  of  Figure  3.  The  mixing  coefficients  were 
selected  so  that  the  mixing  is  consistent  with  signatures  that 
could  be  expected  for  EMI  data  measured  at  eleven  different 
spatial  locations.  The  bottom  two  panels  show  the  two 
“independent  components”  extracted  by  a  simple  implementation 
of  the  ICA  algorithm.  As  is  typical  with  ICA  algorithms,  the 
signal  with  the  highest  energy  is  extracted  first  and  with  the 
highest  fidelity.  Clearly,  this  simple  implementation,  which  has 
not  been  optimized  for  this  problem,  does  an  excellent  job  of 
extracting  the  signature  of  one  of  the  objects  and  a  reasonable  job 
of  extracting  the  second. 


Frequency  Frequency 

Figure  3.  ICA  Results.  Top  panel  shows  signals  measured 
using  the  GEM3  from  a  Valmara  landmine  (left)  and  VS50 
landmine  (right).  Middle  panel  shows  two  examples  of  the 
mixed  signals  that  were  supplied  to  the  ICA  algorithm. 
Bottom  panel  shows  the  signals  as  separated  by  the  ICA 
algorithm. 
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ABSTRACT 

Put  simply,  the  global  landmine  problem  is  massive.  Ground  Pen¬ 
etrating  Radar  (GPR)  is  just  one  engineering  solution  currently 
being  investigated.  A  polynomial  amplitude  -  polynomial  phase 
model  is  fitted  to  GPR  returns.  It  is  observed  that  the  second  or¬ 
der  phase  coefficient  shows  deviations  from  background-only  lev¬ 
els  when  a  buried  target  is  present.  A  bootstrap-based  detection 
scheme  is  proposed  that  tests  for  this  change.  The  technique  is 
applied  to  real  GPR  data,  with  encouraging  results. 

1.  INTRODUCTION 

Anti-personnel  landmines  have  been  used  in  war  zones  through¬ 
out  the  world,  and  have  profound  effects  on  civilian  populations. 
Landmines  have  a  very  long  life-span,  rendering  many  post-war 
areas  both  useless  and  dangerous.  These  minefields  can  be  found 
anywhere  from  agricultural  fields,  river  banks,  urban  areas,  trans¬ 
port  routes  and  surrounding  villages.  The  effect  is  a  terrorised  and 
demoralised  local  population. 

In  many  post-war  zones,  landmines  with  little  or  no  metal  con¬ 
tent  have  been  found.  They  are  often  quite  small;  made  using 
a  plastic  casing  and  very  few,  if  any,  metal  parts.  Consequently, 
conventional  metal  detectors  are  not  effective  countermeasures  for 
these  mines.  Metal  detectors  also  suffer  from  a  high  false  alarm 
rate  due  to  shrapnel  and  debris  lodged  below  the  surface. 

Surface  or  ground  penetrating  radar  (GPR)  [1,2]  works  by  de¬ 
tecting  discontinuities  in  the  dielectric  properties  of  the  soil.  The 
size  and  shape  of  targets  made  from  materials  such  as  plastic  can 
potentially  be  determined  using  this  technology.  However,  envi¬ 
ronmental  conditions  such  as  soil  type  and  moisture  content  can 
heavily  influence  the  performance  of  a  GPR  system.  Therefore, 
signal  processing  techniques  are  needed  in  order  to  develop  ro¬ 
bust  detection  schemes  which  can  compensate  for  changes  in  back¬ 
ground  conditions. 

The  complex  nature  of  the  physical  scenario  makes  it  very  dif¬ 
ficult  to  accurately  model  a  GPR  return.  In  this  paper,  investiga¬ 
tions  into  a  polynomial  amplitude  -  polynomial  phase  model  are 
made. 


2.  SIGNAL  MODEL  AND  PRELIMINARY 
INVESTIGATIONS 

It  is  difficult  to  define  a  complete,  physically-motivated  signal  model 
of  the  GPR  backscatter  waveform.  However,  based  on  extensive 


Fig.  1.  Top:  Fit  of  the  polynomial  amplitude  -  polynomial  phase 
model  (bold  line)  to  data  (fine  line)  from  non-homogeneous  soil. 
Bottom:  Residual  of  the  fit  to  data. 


data  analysis,  a  polynomial  amplitude  -  polynomial  phase  model 
has  been  proposed  (following  [3]) 


9t  = 


y.  a„t" 


exp 


n 

L  m=0 


6m  t" 


+  Zt 


(1) 


where  zt  is  assumed  to  be  stationary  interference.  The  amplitude 
and  frequency  modulation  are  described  by  polynomials  of  order 
Pa  and  Pt,  respectively. 

To  demonstrate  the  validity  of  this  signal  model,  it  has  been 
applied  to  GPR  returns  from  clay  soil,  containing  a  small  plastic 
target,  denoted  ST-AP(  1)  -  a  surrogate  for  the  M 14  anti-personnel 
(AP)  mine  [4].  This  target  has  a  PVC  casing  and  is  filled  with 
paraffin  wax.  A  solid  stainless  steel  cylinder  5  cm  in  diameter 
and  length  is  also  present,  denoted  by  SS05x05.  Phase  and  ampli¬ 
tude  parameters  are  estimated  using  the  DPT  [5]  and  the  method 
of  least-squares  respectively. 

Results  are  shown  for  a  single  GPR  return  signal,  in  Figure  1 . 
Here  a  window  length  of  100  samples  has  been  chosen  in  a  section 
of  backscatter  which  contains  a  shallow  buried  AP  mine.  Phase 
and  amplitude  models  of  order  2  and  4  respectively  have  been 
used.  The  model  is  observed  to  closely  approximate  the  GPR  sig¬ 
nal.  As  a  result  of  extensive  data  analysis  of  returns  from  a  variety 
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Normal  Probability  Plot 


of  soil  types,  it  was  found  that  the  polynomial  phase  may  have  or¬ 
der  up  to  3.  It  is  proposed  that  a  change  in  the  model  parameters 
will  indicate  the  presence  of  a  target. 

The  three  phase  coefficients  from  a  second  order  phase  model 
(i.e.  chirp)  are  estimated  for  a  full  data  set  (jB-scan)  containing  the 
two  targets:  ST-AP(l)  and  SS05x05.  From  the  results  in  Figure  2, 
it  is  noted  that,  unlike  bo  and  bi,  62  appears  to  show  a  deviation 
at  both  targets  when  compared  to  its  underlying  background-only 
value.  This  suggests  that  testing  for  a  change  in  62  may  be  a  target 
indicator.  This  was  supported  by  results  obtained  from  a  variety  of 
data  sets. 


0  100  200  300  400  500 
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Fig.  2.  Phase  coefficients  from  an  order  2  model.  Vertical  dot¬ 
ted  lines  indicate  the  approximate  position  of  ST-AP(l)  (left)  and 
SS05x05  (right)  targets. 


3.  TEST  ON  THE  PARAMETERS 

From  a  background  region  of  100  samples,  estimates  of  bo  are  seen 
to  be  approximately  Gaussian  distributed.  A  normal  probability 
plot  (Q  —  Q  plot)  of  62  is  shown  in  Figure  3.  It  is  suggested  that 
the  mean  of  this  distribution  changes  in  the  presence  of  a  target. 
Although,  for  the  most  part,  the  approximation  appears  to  be  valid, 
there  does  appear  to  be  some  deviation  from  Gaussianity  in  the 
tails. 

For  the  data  set  shown  in  Figure  2,  62  is  seen  to  decrease  in  the 
presence  of  a  target,  however,  in  some  cases,  in  particular  when 
the  soil  type  is  loam,  62  increases  with  the  presence  of  a  target. 
Therefore,  two-sided  hypotheses  are  considered 

Ho  ■  f>2  =  &2,0 
Ha  '•  f>2  7^  f>2,o 

where  62,0  denotes  the  value  of  62  under  the  null  (no  target  present). 
The  obvious  test  statistic  is 

j,  _  bj  —  62,0 
&i2 


where  o-b2  is  an  estimate  of  the  standard  deviation  of  62.  Since  the 
exact  distribution  of  62  is  unknown  -  it  may  deviate  from  Gaus¬ 
sianity  in  the  tails  and  has  unknown  variance  -  it  is  proposed  that 
the  bootstrap  be  used  to  determine  thresholds. 


Fig.  3.  Norma]  probability  plot  of  third  phase  coefficient  estimate, 
b 2,  from  a  background  region. 


4.  BOOTSTRAP-BASED  DETECTION 

The  bootstrap  was  introduced  [6]  as  a  tool  for  estimating  the  sam¬ 
ple  distribution  of  statistics  when  standard  methods  cannot  be  ap¬ 
plied.  Observations  are  randomly  resampled  and  the  statistics  re¬ 
computed  -  mimicking  the  process  of  repeating  the  experiment. 

When  this  is  done  a  large  number  of  times,  the  distribution  of 
the  re-computed  values  approximates  the  distribution  of  the  statis¬ 
tic.  Consequently,  the  test  statistic  can  be  compared  to  this  boot¬ 
strap  distribution,  and  a  hypothesis  test  performed.  More  informa¬ 
tion  on  the  use  of  the  bootstrap  for  hypothesis  testing  can  be  found 
in  [7], 

The  technique  used  here  is  described  in  Table  1.  After  fitting 
the  polynomial  amplitude  -  polynomial  phase  model  to  the  data, 
the  residuals  are  whitened  using  an  AR  model.  A  block  boot¬ 
strap  resampling  technique  is  used  due  to  remaining  structure  in 
the  residuals  -  this  ensured  that  the  bootstrap  signals  were  simi¬ 
lar  in  form  to  the  observed  signals.  See  Figure  4  for  examples  of 
the  generated  bootstrap  signals.  The  bound,  62,0,  is  found  from  a 
region  which  is  known  to  be  target-free. 

5.  APPLICATION  TO  REAL  DATA 

The  proposed  detector  has  been  applied  to  GPR  data  collected 
at  the  Defence  Science  and  Technology  Organisation,  Salisbury, 
South  Australia.  An  FR-127-MSCB  Impulse  GPR  (ImGPR)  sys¬ 
tem  developed  by  the  Commonwealth  Scientific  and  Industrial  Re¬ 
search  Organisation  (CSIRO,  Australia)  has  been  used  for  these 
measurements  [4,  9].  The  system  collects  127  returns,  or  sound¬ 
ings,  per  second,  each  composed  of  512  samples  with  12  bit  accu¬ 
racy.  The  sounding  range  may  vary  from  4  ns  to  32  ns.  The  GPR 
system  uses  bistatic  bow-tie  antennas  which  transmit  wide -band, 
ultra-short  duration  pulses. 

In  this  experiment,  the  antennas  had  a  centre  frequency  of  1.4 
GHz  and  80%  bandwidth.  The  GPR  unit  is  suspended  above  the 
ground  surface  at  a  height  of  between  0.5  to  2  cm.  Its  motion 
is  controlled  by  a  stepper  motor  unit  running  along  a  track  at  a 
constant  velocity,  as  shown  in  Figure  5.  Since  the  motion  of  the 
GPR  is  controlled  by  a  stepper  motor,  with  constant  speed,  running 
on  a  straight  track,  these  samples  correspond  to  distances  from  the 
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1.  If  gn  for  n  =  0, N  -  1  is  a  sampled  GPR  re¬ 
turn  signal,  fit  a  polynomial  amplitude  -  polynomial 
phase  model  to  the  data.  From  the  model,  an  estimate 
of  the  second  order  phase  parameter,  bo,  is  made. 

2.  Form  the  residual  signal  r„  =  g„  —  gn ,  where  g„  is 
the  polynomial  amplitude-polynomial  phase  model 
corresponding  to  estimated  parameters. 

3.  Whiten  r„  by  removing  an  AR  model  of  suitable  to 
obtain  the  innovations  z„ . 

4.  Re-sample  from  zn  N  times  using  the  Block  Boot¬ 
strap  [8]  to  obtain  z* . 

5.  Repeat  step  4  B  times  to  obtain  z*1 , ...,  z*B . 

6.  Generate  B  bootstrap  residual  signals  r*',  for  i  = 
1, ...,  B  by  filtering  s*'  with  the  AR  process  obtained 
in  3. 

7.  Generate  B  bootstrap  signals  g*'  =  g,,  +  r*\  for 
i  =  1,  ■■.,  B. 

8.  Estimate  the  third  phase  coefficient  from  g*’  to  ob¬ 
tain  bV  for  i  =  1, ...,  B. 

b* 1  — 

9.  Calculate  the  bootstrap  statistics  T”  =  3&,  for 
i  =  1, ...,  B. 

10.  Compare  the  test  statistic  T  =  tl2ah2'°  to  the  empir- 

ical  distribution  of  T* .  Reject  Ho  if  T  is  in  the  tail 
of  the  distribution,  otherwise  retain  Ho. 

Table  1.  Bootstrap-based  testing  of  bo. 


Fig.  4.  Top:  Fit  of  the  polynomial  amplitude  -  polynomial  phase 
model  (bold  line)  to  data  (fine  line)  in  the  region  of  interest.  Bot¬ 
tom:  Bootstrap  signals  generated  using  the  model  and  blocked 
bootstrap  resampling. 


Fig.  5.  The  ImGPR  unit  running  over  a  sandbox. 


Target 

Surrogate 

Dimensions 

diamxhght 

Orientation 

ST-AP(l) 

M14 

52  x  42  mm 

not  critical 

ST-AP(2) 

PMN 

118  x  50mm 

sensitive 

ST-AP(3) 

PMN2 

115  x  53mm 

sensitive 

Table  2.  Minelike  surrogate  targets  used. 


starting  point  of  the  run. 

Some  of  the  targets  used  in  the  trial  are  listed  in  Table  2.  The 
PMN  and  PMN-2  are  AP  mines  with  non-metallic  casings.  The 
M14  is  an  AP  mine  with  almost  no  metal  content  and  small  size. 
As  such  it  is  a  very  difficult  target  for  detection. 

Shown  in  Figures  6,7  and  8  are  sample  results  for  three  sur¬ 
rogate  mines  as  well  as  various  other  “targets”.  The  location  of 
the  first  target  in  each  scenario  is  around  the  300'th  trace,  while 
the  second  target  is  around  the  690'th  trace.  Overall,  the  results 
are  very  encouraging.  It  can  be  said  that  the  targets  have  been 
correctly  detected,  while  false  alarms  appear  to  be  concentrated 
in  areas  near  the  targets  -  rather  than  true  false  alarms  that  occur 
far  from  the  targets  in  background  only  areas.  These  near-target 
false  alarms  may  be  triggered  by  disturbances  to  the  soil  structure 
caused  by  the  burying  of  the  target. 

All  results  included  here  were  obtained  from  targets  buried 
at  approximately  5  cm  below  the  surface.  Results  from  shallow 
buried  targets  in  the  range  0.5  cm  to  2  cm  below  the  surface  pro¬ 
duce  a  more  significant  change  in  the  test  statistic,  and  the  detec¬ 
tion  region  is  has  greater  spread.  A  comparative  investigation  of 
alternate  models  and  detection  schemes  is  continuing.  Testing  on 
different  scenarios  with  different  soil  types  is  also  ongoing. 

6.  CONCLUSIONS 

From  preliminary  results  it  has  been  seen  that  when  a  polyno¬ 
mial  amplitude  -  polynomial  phase  model  is  fitted  to  GPR  returns, 
changes  in  the  estimated  second  order  phase  parameter  indicates 
the  presence  of  a  target.  At  present,  the  bootstrap  is  being  utilised 
to  estimate  the  distribution  of  the  parameter.  This  has  yielded  a 
detector  that  has  shown  very  encouraging  results  when  run  on  real 
GPR  data. 

It  should  be  stressed  that  the  method  presented  in  this  paper 
is  purely  for  detection.  Classification  is  omitted.  Classification 


of  detected  targets  would  also  be  required  for  effective  landmine 
clearance.  Following  a  detection  stage,  techniques  such  as  multi¬ 
ple  test  procedures  [10]  and  time-frequency  signatures  [11]  have 
been  applied  for  this  purpose. 
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Sample 


Fig.  6.  Top:  B-scan  with  ST-AP(l)  and  SS05x05  targets  buried  at 
5  cm  in  clay.  Bottom:  Corresponding  test  statistics  (thin  line)  and 
detection  decision  (thick  line). 
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Fig.  7.  Top:  B-scan  with  ST-AP(2)  target  (parallel  orientation)  and 
an  aluminium  soft-drink  can  buried  at  5  cm  in  clay.  Bottom:  Cor¬ 
responding  test  statistics  (thin  line)  and  detection  decision  (thick 
line). 


*ioJ 


Fig.  8.  Top:  B-scan  with  ST-AP(3)  target  (perpendicular  orienta¬ 
tion)  and  shrapnel  buried  at  5  cm  in  clay.  Bottom:  Corresponding 
test  statistics  (thin  line)  and  detection  decision  (thick  line). 
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ABSTRACT 

This  paper  proposes  a  method  to  detect  infrared  land 
mine  signatures  embedded  in  rotational]}-  invariant  col¬ 
ored  noise.  A  common  problem  in  statistical  image  pro¬ 
cessing  is  high  dimensionality.  This  causes  a  need  for 
large  sets  of  training  data.  To  overcome  this,  an  alter¬ 
native  formulation  of  the  Generalized  Likelihood  Ratio 
Test  (GLRT)  is  presented.  This  formulation  makes  it 
possible  to  utilize  the  circular-symmetry,  rendering  a 
substantial  decrease  in  model  dimensionality  and  con¬ 
sequently,  in  the  amount  of  training  data  needed.  Sim¬ 
ulations  indicate  that  a  significant  gain  in  performance 
can  be  achieved  compared  to  both  the  non-parameterized 
detector  and  the  matched  filter. 

1.  INTRODUCTION 

The  presence  of  land  mines  is  one  of  the  worst  environ¬ 
mental  problems  that  faces  humanity.  Each  year,  10000 
people  are  killed  and  30000  are  injured  in  mine  related 
accidents.  Traditional  techniques  to  detect  and  remove 
buried  mines  are  both  dangerous  and  time  consuming, 
urging  the  need  for  more  effective  methods.  One  of 
the  emerging  techniques  that  has  gained  the  most  at¬ 
tention  is  infrared  imaging  [1].  Detecting  buried  mines 
using  infrared  imaging  is  possible  since  a  buried  object 
will  interfere  with  the  natural  heat  and  mass  transfer 
constantly  taking  place  in  the  soil  and  at  the  surface. 
The  result  is  a  thermal  signature  at  the  surface  that 
may  be  detected  by  an  infrared  imaging  system. 

The  thermal  signature  will  be  embedded  in  noise 
caused  by  fluctuations  in  the  soil  structure  and  the 
surface.  In  order  to  design  a  detector  it  is  necessary 
to  model  the  characteristics  of  the  noise.  Such  a  model 
should  be  accurate  enough  to  incorporate  the  vital  fea¬ 
tures  of  the  noise,  while  still  simple  enough  to  facili¬ 
tate  implementation.  In  practice,  the  distribution  of 
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the  noise  is  not  known,  but  may  be  estimated  from  off¬ 
line  data.  Assuming  a  Gaussian  distribution,  the  issue 
reduces  to  finding  the  second  order  statistics. 

The  main  problem  when  utilizing  covariance-based 
methods  in  image  processing  is  that  the  dimensionality 
is  often  very  high.  As  a  result  the  number  of  parame¬ 
ters  that  are  to  be  estimated  is  considerable,  requiring 
large  sets  of  training  data  and  memory  expensive  al¬ 
gorithms.  An  alternative  approach  is  to  use  some  ba¬ 
sic  features  of  the  noise  enabling  parameterization  of 
the  covariance  by  means  of  a  small  set  of  parameters. 
Assuming  spatial  stationarity,  one  such  approach  is  to 
model  the  colored  noise  as  an  autoregressive  process  [2] . 
This  has  previously  been  employed  for  land  mine  detec¬ 
tion,  see  [5].  In  addition  to  stationarity,  it  would  also 
be  desirable  to  exploit  that  the  noise  can  be  considered 
rotational]}-  invariant  for  many  relevant  backgrounds 
such  as  gravel  roads  and  sand.  Further,  for  cases  where 
there  exists  view  angle  dependence,  the  measured  im¬ 
age  can  still  have  a  rotationally  invariant  probability 
density  function  (pdf)  if  the  camera  capturing  the  sur¬ 
face  has  an  unknown  angle.  Rotational  invariance  has 
been  studied  previously,  mainly  for  the  purpose  of  tex¬ 
ture  classification,  see  for  instance  [3].  To  incorporate  a 
parameterization  of  the  covariance  matrix  that  models 
the  circular  symmetry  is  a  difficult  task.  To  circum¬ 
vent  this  problem,  we  present  a  reformulation  of  the 
detector  where  we  train  the  linear  detector  directly. 

2.  BASIC  ASSUMPTIONS 

Let  s(x,y)  denote  an  infrared  image  defined  over  an 
area  of  interest.  The  image  is  measured  at  the  coor¬ 
dinates  ( x ,  y ),  where  x  =  —  G, . . . ,  G,  y  —  —G, . . . ,  G, 
and  the  values  are  stacked  into  the  column  vector  s. 
Further,  let  the  image,  s(x,y),  be  an  outcome  of  the 
stochastic  image,  S(x,  y),  so  that  s,  is  a  realization  of 
the  stochastic  vector  S. 

The  problem  is  to  decide  whether  or  not  there  is  a 
buried  land  mine  at  the  center  of  the  image.  Therefore, 
the  detection  problem,  assuming  additive  noise,  is  to 
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distinguish  between  the  two  hypotheses 


3.2.  Unknown  Covariance  Matrix,  R 


Ho  ■  s  =  n 

Hi  :  s  =  n+Jm.  (1) 

Here,  the  noise  vector,  n,  is  the  sample  vector  of  the 
background  noise  image,  n{x,y).  As  for  the  measured 
image,  n(x,y)  is  an  outcome  of  the  stochastic  image 
A f(x,  y),  while  n  is  an  outcome  of  the  stochastic  vector 
N.  The  stochastic  image,  A f(x,  y),  is  assumed  to  have  a 
rotationally  invariant  pdf,  and  N  is  assumed  to  be  zero 
mean,  Gaussian  distributed  with  covariance  matrix  R. 
The  matrix  R  is  regarded  as  unknown  and  determin¬ 
istic.  Moreover,  the  shape  of  the  mine  signature,  m, 
is  the  sample  vector  of  the  circularly  symmetric  image 
m(x,y).  The  vector,  m,  is  assumed  to  be  known,  and 
of  unit  length  (mTm  =  1).  Finally,  the  magnitude,  9, 
is  assumed  to  be  a  non-negative,  but  unknown,  deter¬ 
ministic  constant. 

The  Generalized  Likelihood  Ratio  Test  (GLRT)  for 
the  detection  problem  (1)  is 

f(s;0ML,Hi) 

f(s;H0) 

where  9ML  is  the  Maximum  Likelihood  (ML)  estimate 
of  the  parameter  9  (assuming  Hi).  For  Gaussian  noise, 
this  test  is  equivalent  to  the  test 

l~tl 

eML  >  7,  (2) 

Ho 

for  some  constant  7.  Unfortunately,  the  ML  estimate 
can  not  be  found  since  the  covariance  matrix  of  the 
noise  is  unknown.  Therefore,  we  search  for  other  esti¬ 
mates  to  be  used  in  place  of  9ML .  To  our  aid  we  have 
K  noise  samples  (images)  n*,,  k  —  1,2, ...  ,K,  that 
provide  information  about  the  noise  distribution. 

3.  ESTIMATING  THE  AMPLITUDE,  6 


Hi 

%  71, 
Ho 


Although  the  covariance  matrix  is  unknown,  the  esti¬ 
mator  structure  in  (3)  can  still  be  used  if  training  data 
is  used  to  estimate  R. 

An  unbiased,  and  commonly  used,  estimate  of  the 
covariance  matrix,  R,  is 


R=  (nfc)T- 

k— 1 


Using  this  estimate  instead  of  the  true  covariance  ma¬ 
trix  in  (3),  we  obtain  an  alternative  estimator  of  0 


qR 


mTR  *s 
mR_1m 


(4) 


which  is  close  to  9ML  whenever  R  is  a  good  approxima¬ 
tion  of  R.  The  problem  is  that  the  number  of  parame¬ 
ters  in  R  is  very  large  (approximately  (2G  +  l)4/2  for 
an  image  of  size  (2G+1)  x  (2G+1)).  Therefore,  to  esti¬ 
mate  it  accurately  a  large  number  of  training  images  is 
needed.  It  is  also  hard  to  utilize  the  circular  symmetry 
in  the  model  to  reduce  the  number  of  parameters. 


3.3.  Matched  Filter 

The  matched  filter  is  the  most  frequently  used  estima¬ 
tor  (and  thus  detector).  It  has  the  advantage  of  being 
both  simple  and  relatively  robust,  though  not  optimal 
in  general.  Assuming  Hi,  the  matched  filter  estimate 
is  (since  m  is  of  unit  length) 

§MF  =  mTs.  (5) 

This  is  the  ML  estimate  if  the  noise  is  white,  i.e.  if 
R  oc  I  (I  represents  the  identity  matrix),  and  we  will 
argue  in  Section  3.4  that  it  is  the  best  that  can  be 
done  if  we  have  an  unknown  covariance  matrix  and  no 
training  data. 


We  study  three  common  estimators  of  9  (assuming  Hi) 
and  propose  a  new  estimator  that  utilizes  the  rotational 
invariance. 


3.1.  Known  Covariance  Matrix,  R 


This  is  the  well  known  case  when  the  noise  is  Gaussian 
with  a  known  covariance  matrix.  The  ML  estimate 
mTR_1s 


9 


ML 


mUR-W 


(3) 


can  be  easily  derived  from  the  definitions,  see  e.g.  [4]. 
Although  unable  to  be  used  in  practice,  it  is  included 
for  comparison. 


3.4.  Proposed  Estimator 

Again,  we  consider  the  case  where  the  covariance  ma¬ 
trix  is  unknown,  and  therefore  the  ML-estimate  is  non¬ 
trivial  to  calculate.  As  in  Section  3.2,  we  use  training 
data  to  estimate  the  unknown  noise  parameters,  but 
here  we  search  for  a  new  formulation  of  the  ML  esti¬ 
mate,  for  which,  in  contrast  to  (4),  the  circular  symme¬ 
try  can  be  utilized  to  reduce  the  number  of  unknown 
parameters. 

For  notation,  let  ©n  denote  the  amplitude  of  the 
stochastic  noise  in  the  mine  direction  (@n  =  mTN), 
and  let  9n  be  an  outcome  of  0n,  (9n  =  mTn).  Also, 
let  be  the  projection  matrix  onto  the  orthogonal 
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complement  of  m1.  Then  the  measured  image,  s,  can 
be  decomposed  as 

s  =  0  •  m  +  n  =  (9  +  6„)  m  +  nxn. 

Therefore,  as  mTs  =  9  +  0n,  a  reasonable  estimate  of 
9  is 

9  =  mTs  -  argmax/eii|nxN(6>n|Il-Ln).  (6) 
9n 

In  order  to  prove  that  this  is  indeed  the  ML  estimate, 
we  state  the  following  theorem: 

Theorem  1  Suppose  we  observe 

X  —  Mo(0  +  N0)  +  MiNi 

where  Mo  and  are  matrices  of  size  qxp0  and  qxpi 
respectively,  such  that  Mg  Mo  =  I,  MxMo  =  0  and 
MfMi  =  I.  Furthermore.  No  and  Ni  arc  stochastic 
vectors  of  size  po  x  1  and  p i  x  1  respectively.  Then  the 
ML  estimate  of  9  is 

0ML  =  Mjx  -  argmax/No|Nl(n0|Mfx), 

no 

where  x  is  the  outcome  of  the  stochastic  vector  X. 

Proof: 

9ml  =  argmax/x(x;  9) 

9 

=  arg  max  /No,Nj<  (My  x  -  9,  Mx  x) 

=  arg  max/No|Nl  (M^x  -  0|M'fx) 

=  MyX  -  argmax/NoiNl(n0|Mxx) 

no 

■ 

Consequently,  the  ML  estimate  is  obtained  by  correct¬ 
ing  the  matched  filter  estimate,  mTs,  with  an  estimate 
of  the  noise  contribution,  9n,  and  (6)  is  indeed  the  ML 
estimate. 

The  problem  then  reduces  to  estimating  9n  given 
nxn,  i.e.  argmax/en|nxN(0„|n_Ln).  Obviously,  if  we 

have  no  training  data,  i.e.  no  information  about  R,  the 
best  estimate  is  zero  (as  ©n  is  zero  mean).  In  other 
words,  without  training  data  we  can  do  no  better  than 
the  matched  filter. 

In  order  to  derive  a  closed  form  expression  for  the 
ML  estimate,  we  note  that  for  a  jointly  Gaussian  dis¬ 
tribution  we  have 

argmax/e„|niN(^n|n±n)  =  aTnxn  =  aTnxs 

On 

^  =  I  -  mm7 


for  some  vector  a.  This  in  turn  yields  the  estimate  as 


=  (mr  -  arnx) : 


where,  from  (3), 


R-'m 


a  =  m  — 


The  advantage  of  this  formulation  is  that  it  enables  us 
to  invoke  a  parameterization  that  utilizes  the  circular 
symmetry  to  reduce  the  number  of  parameters. 

According  to  the  assumption  that  the  pdf  of  A f(x,  y ) 
is  rotationally  invariant,  and  that  m(x,y)  is  circular 
symmetric,  the  estimate 

9n  =  aTIIxn 

should  be  invariant  to  rotations  of  n(x,y). 

For  this  condition  to  be  (at  least  almost)  fulfilled,  a 
should  be  the  sample  vect  or  of  some  circular  symmetric 
continuous  image  a(x,y).  If  the  images  are  sampled 
sufficiently  often  compared  to  their  frequency  content, 
the  following  Fourier-series  like  expansion  is  adequate 

G 

a(x,y)  =  a,; '  ck{x,y) 


where 


Ck(x,y)  = 


2i:k\/x‘2+y'2 

2N 


/x2  +  y2  <  G 
lx 2  +  y 2  >  G. 


Hence,  we  can  find  a  linear  parameterization  using  the 
unknown  vector  d  as 


a=  Cd, 


where  the  A-’th  column  of  the  matrix  C  is  the  sample 
vector  of  Ck-i(x,  y),  and  the  fc’th  element  of  the  vector 
d  is  a/,._ i-  Thus, 

9ml  =  (m  -  drCTIIx)  s  (8) 

and  we  have  found  a  representation  where  the  G  +  1 
dimensional  vector  d  is  all  that  is  to  be  estimated  from 
training  data.  We  have  therefore  reduced  the  number 
of  parameters  to  G+  1.  To  estimate  d,  we  utilize  some 
measured  noise  samples  n/,;,  k  =  1, 2, . . . ,  K ,  and  solve 
the  least  square  problem 

K  ,  2 
d  =  arg  nun  ^mTrp-  -  (nxnA,.)3  Cd  j  . 
fc=i  V 

Using  this  estimate  in  (8),  we  obtain  the  following  es¬ 
timate  for  9 

9*  =  (m  -  drCTnx)  s.  (9) 
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Figure  1:  The  performance  of  the  different  detectors 
for  Pfa  =  0.3,  given  the  number  of  training  data 

4.  SIMULATION  RESULTS 

In  order  to  evaluate  the  proposed  approach,  we  use  the 
detector  as  given  by  (2).  In  particular,  we  compare  the 
detector  performance  when  using  the  proposed  estima¬ 
tor  (8),  here  denoted  Estimated  d,  to  that  of  employing 
the  three  other  estimates  described  in  Section  3.  The 
test  employing  the  estimate  of  the  full  covariance  ma¬ 
trix,  R,  as  given  by  (4)  will  be  denoted  Estimated  R, 
while  the  one  using  the  correlation  estimate  given  by 
(5)  will  be  denoted  Matched  Filter.  To  serve  as  a  refer¬ 
ence,  although  not  possible  to  implement  in  reality,  we 
also  include  the  performance  of  the  detector  utilizing  a 
known  covariance  matrix,  here  denoted  Known  R. 

In  the  simulations  we  considered  images  of  size  33  x 
33.  The  background  noise  was  created  by  passing  white 
Gaussian  noise  through  an  FIR  filter  with  low-pass 
characteristics.  The  rotational  invariance  was  ensured 
by  using  an  FIR  filter  with  a  circularly  symmetric  im¬ 
pulse  response.  Moreover,  the  mine  signature  had  a 
radius  of  approximately  5  pixels.  The  signature  is  cho¬ 
sen  as  a  smoothed  version  of  the  top  view  shape  of  a 
cylindrical  shaped  mine,  see  [5].  As  both  the  target 
and  the  noise  have  low-pass  characteristics,  the  detec¬ 
tion  problem  is  especially  difficult. 

The  main  contribution  in  this  paper  is  a  new  for¬ 
mulation  of  the  ML  estimate,  that  greatly  reduces  the 
number  of  unknown  parameters  compared  to  when  the 
full  covariance  matrix  is  estimated.  Therefore,  in  Fig¬ 
ure  1  we  study  how  the  probability  of  detection,  PD, 
depends  on  the  number  of  training  images,  for  a  given 
probability  of  false  alarm,  Pfa  =  0.3.  Furthermore,  to 
illustrate  how  the  detectors  perform  for  different  prob¬ 
abilities  of  false  alarm,  we  plot  the  receiver  operating 
characteristics  (ROC)  using  240  training  images  in  Fig¬ 
ure  2.  From  the  figures,  it  can  be  seen  that  the  per¬ 
formance  of  the  methods  which  estimate  either  d  or  R 
improves  with  the  number  of  training  images,  whereas, 


Figure  2:  ROC  for  the  four  detectors  using  240  training 
images 

of  course,  the  performance  of  the  Known  R  and  the 
Matched  Filter  detectors  do  not  depend  on  the  amount 
of  training  data.  More  importantly,  as  expected,  the 
proposed  detector  only  needs  a  fraction  of  the  number 
of  training  data  as  compared  to  the  method  that  es¬ 
timates  the  full  covariance  matrix.  In  particular,  the 
proposed  scheme  only  needs  a  few  training  images  to 
outperform  the  detector  employing  the  matched  filter 
estimate.  It  is  also  interesting  to  note  that  for  fewer 
than  1000  training  images  Estimated  R  does  not  seem 
to  perform  any  better  than  a  detector  that  totally  dis¬ 
regards  all  measurements,  i.e.  Pd  —  Pfa- 

5.  CONCLUSIONS 

In  this  paper  we  proposed  a  detector  that  takes  advan¬ 
tage  of  the  circular  symmetry  to  drastically  reduce  the 
number  of  unknown  parameters  in  the  detector. 

Simulations  confirm  that  the  need  for  training  data 
is  greatly  reduced,  and  that  the  performance  is  im¬ 
proved  for  reasonable  amounts  of  training  data. 
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ABSTRACT 

A  detector  using  atomic  decomposition  with  a  chirplet  dictio¬ 
nary  is  analyzed.  It  is  derived  from  the  generalized  likelihood  ratio 
test  and  has  constant  false  alarm  rate.  The  atomic  decomposition 
is  performed  via  a  genetic  algorithm  and  compared  with  previous 
approaches. 

1.  INTRODUCTION 

The  atomic  decomposition  [1],  also  known  as  matching  pursuit  [2] 
or  adaptive  Gabor  representation  [3],  is  an  adaptive  approximation 
technique  providing  a  sparse,  flexible,  and  physically  meaningful 
representation  of  the  signals.  In  spite  of  its  suitability  for  the  analy¬ 
sis  of  unknown  signals  [2],  the  detection  of  signals  in  noise  via  the 
atomic  decomposition  (AD)  has  not  been  stated  yet.  In  this  paper, 
we  present  an  AD-based  detector  designed  according  to  the  gener¬ 
alized  likelihood  ratio  test  (GLRT),  and  analyze  its  performance  in 
the  Neyman-Pearson  sense.  We  also  prove  its  constant  false  alarm 
rate  (CFAR)  characteristic  in  zero-mean,  complex,  white,  Gaus¬ 
sian  noise  (CWGN). 

The  AD  sparseness  is  attained  by  a  highly  redundant  dictio¬ 
nary  of  unit  energy  signals,  called  atoms.  Let  D  =  {h-.O)}  be  a 
dictionary  of  atoms,  and  s(n)  the  signal  under  analysis,  the  AD  is 


obtained  as 

7  =  argmax  |(sp_i(n),  ^(n))!'  ,  (1) 

p  2 

and 

bp  =  ^sp-i(n),/tlp(«)y  ■  (2) 

sp(n)  comes  from 

sp(n)  =  sp- 1  (n)  -  bp  ei4>p  hlp(n),  p  >  0  ,  (3) 

so (n)  =  s(n)  .  (4) 

The  signal  under  analysis  is  approximated  by 

s(n)  hlr(n)  p  =  1,2....  (5) 

p 


As  in  [1,4, 5],  we  employ  a  dictionary  of  chirplets,  i.e.  chirped 
Gabor  functions,  because  linear  frequency  modulation  is  a  very 

This  work  was  supported  by  the  National  Board  of  Scientific  and  Tech¬ 
nology  Research  (CICYT)  under  project  TIC-99-1 172-C02-01/02.  and  by 
a  FPU  fellowship  of  the  Ministry  of  Education. 


common  feature  of  man-made  signals  and  natural  signals.  In  the 
following,  it  is  assumed  that  the  signal  under  analysis  is  a  weighted 
sum  of  chirplets.  The  chirplet  dictionary  is  parameterized  through 
the  4-component  vector: 

7  =  [a,/3,T,/]\  (6) 

and  every  chirplet  is  defined  as1 

/,>)  =  (£)I/4c-t<-7’>a  .cf[2-/(«-m-/»(«-T)a]  _  (7) 

Hence,  every  extracted  atom  is  defined  by  means  of  the  6-component 
vector 

[&;»  ij,.  <Pp]  -  <8> 

with  bp  a  positive  real  number,  and  h2p  the  energy  of  the  pth  ex¬ 
tracted  atom. 

2.  AD  USING  A  GENETIC  ALGORITHM 

The  optimization  procedure  for  ( 1 )  has  to  be  carefully  chosen  be¬ 
cause  of  the  extremely  complex  structure  of  the  objective  function, 
with  multiple  local  optima  coming  from  the  existence  of  noise  and 
multi -component  signals,  and  domain  regions  where  it  is  nearly 
constant.  Therefore,  global  search  algorithms  refined  by  descen- 
dent  techniques  are  the  most  suitable  strategies. 

We  use  a  genetic  algorithm  (GA)  refined  with  a  downhill  sim¬ 
plex  method  [6],  The  selected  GA  is  the  most  popular  described 
in  the  literature  [7],  A  detailed  description  of  its  parameter  values 
is  given  in  Table  1 .  The  probability  of  crossover  has  been  fixed  to 
1  and  the  population  size  to  200  in  order  to  reduce  the  premature 
convergence  to  local  optima  [7].  The  search  range  for  the  com¬ 
ponents  of  vector  (6)  is  application  dependent.  The  rest  of  GA 
parameters  presents  typical  values  [7], 

Other  authors  propose  different  algorithms  to  optimize  (1).  In 
Table  2,  we  show  the  average  time  employed  to  find  the  first  atom 
of  an  atomic  decomposition  using  different  approaches  and  their 
complexity.  GAAD  is  our  algorithm,  and  TFAD64  and  TFAD512 
are  approaches  by  O'Neill  and  Flandrin  using  the  ambiguity  func¬ 
tion  with  resolution  64  and  512  respectively  [)]'.  GMP  is  a  version 
of  [5]  with  a  subdictionary  of  more  than  20000  signals.  N  is  the 
number  of  signal  samples  (N  =  1024).  As  can  be  noticed,  the 
GAAD  complexity  is  linear  with  regard  to  N.  On  the  other  hand, 

1  It  is  assumed  that  the  sampling  rate  is  one  and  the  chirplets  are  time- 
and  band-limited  before  sampling. 

2Their  MATLAB  programs  are  available  at  http://mdsp.bu.edu/jcjfo 
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GA  parameter 

Values 

Population  size 

200 

Number  of  generations 

20 

Number  of  enconding  bits 

16 

Probability  of  crossover 

1 

Probability  of  mutation 

0.03 

Search  range  for  a 

10“6,10_i| 

Search  range  for  /3 

[-0.1,0.11 

Search  range  for  T 

[0, 1024] 

Search  range  for  / 

[-0.5, 0.5] 

Table  1.  Genetic  algorithm  parameters  used  in  the  simulations. 


TFAD64  is  the  most  efficient.  However,  it  can  suffer  from  worse 
detection  performance  than  the  GAAD,  as  will  be  shown.  The 
MATLAB  simulations  were  run  on  a  Pentium  II,  350  MHz,  256 
MB  RAM,  with  Windows  98. 


Algorithm 

Time  in  seconds 

Complexity 

GAAD 

31 

0{N) 

TFAD512 

40 

0(N  log., (AT)) 

TFAD64 

5 

0(N  log 2(N)) 

GMP 

150 

0(N 2  log.2(A0) 

Table  2.  Computation  time  and  complexity  of  the  first  atomic  ex¬ 
traction  for  several  AD  techniques. 


3.  DETECTION  OF  A  CfflRPLET  IN  NOISE  USING 
ATOMIC  DECOMPOSITION 


In  this  section,  the  detection  of  a  chirplet  in  noise  using  the  atomic 
decomposition  is  related  to  the  GLRT.  The  detection  is  formulated 
in  terms  of  a  binary  hypothesis  test  where  the  null  and  alternative 
hypothesis  are: 


Ho  :  x(n)  =  r(n),  n  =  1, ...,  N  (9) 

Hi  :  x(n)  =  b  h^n)  +  r(n),  n  =  1, ...,  N,  (10) 

r(n)  is  a  CWGN  with  power  a2.  b  is  a  positive  real  number,  b2  is 
the  chirplet  energy,  and  N  is  the  number  of  samples.  Using  vector 
notation,  the  probability  density  function  (pdU  under  Ho  becomes 

1  ||x[|2 

}h0{x-,ct)  =  - - —Tv  exp — ,  (11) 

(ncr2)  o- 

and  under  H\ 


/tflfe  <7,  b,  *£,</>) 


1 

(-7T  <T2)JV 


.  .  2 

x  —  b  h 

exp - 5 — - — .  (12) 

a1 


To  evaluate  the  GLRT,  the  maximum  likelihood  estimate  (MLE) 
under  Hi  and  Ho  must  be  calculated.  Under  Hi  the  MLE  is  the 
first  extracted  atom  of  the  atomic  decomposition  [1],  Namely,  it 
leads  to  equations  (1)  and  (2)  for  the  parameters  of  the  first  ex¬ 
tracted  atom.  Regarding  a2,  we  can  estimate  it  as 


INI2  ~b2 

N 


(13) 


In  the  following,  we  refer  to  the  extracted  atom  parameters  as  b , 
2,  8,  T,  f,  and  <j>  to  stress  that  they  are  estimates  of  the  parameters 
of  vector  (8).  For  Ho,  the  MLE  of  a2,  a2n,  becomes 


After  some  manipulations,  the  GLRT  turns 


(14) 


Lglr(x)  — 

®on 

where  Th  is  the  threshold.  Thus,  for  the  one-chirplet-in-noise 
case  the  GLRT  depends  on  the  energy  of  the  first  extracted  atom, 
b2,  and  the  noise  power  estimate  a2n. 


h  1 
> 

< 

Ho 


Th, 


(15) 


3.1.  Analytic  model 

The  detector  (15)  allows  an  approximate  analytic  treatment  if  inde¬ 
pendence  between  the  atom  energy  and  the  noise  power  estimates 
is  assumed.  Under  Ho,  a2n  becomes  a  chi-square  random  variable 
of  2 N  degrees  of  freedom.  Namely,  its  pdf  is 

~  (N  -  1)!  (cr2)N  2  exp(-—^-),  z  >  0  .  (16) 

On  the  other  hand,  it  has  been  found  through  Monte  Carlo  analysis 
that  b2  has  a  lognormal  distribution  under  Ho,  i.e. 


fb2  ( z i  °n) 


1 

\/2rr  cr'nZ 


(In  (z)  ~Pnf\ 

2  <rl  J 


,z  >  0  , 
(17) 


This  distribution  gets  a  high  significance  level  in  the  composite 
chi-square  goodness-of-fit  test.  The  lognormal  model  is  valid  for 
the  GA  of  section  2  varying  the  population  size,  the  number  of 
generations,  and  the  noise  power.  Parameters  /rn  and  <r„  are  es¬ 
timated  using  the  maximum  likelihood  criterion.  For  the  first  ex¬ 
tracted  atom,  their  estimates  follow 


=  ln(cr2)  +  [1.9037  +  0.0104  hi {ngen)+ 

0.1050  ln(psize)] , 
an  =  —0.0239  ln(n<?erc)  -  0.0281  In (psize)  +  0.4129  . 


(18) 

(19) 


psize  and  ngen  are  the  GA  population  size  and  number  of  gen¬ 
erations,  respectively.  As  can  be  noticed,  aj  does  not  depend  on 
the  noise  power.  Then,  the  probability  of  false  alarm,  Pfa,  for  a 
given  threshold  Th  is  expressed  by 

pfa  =  [  f  fp  (v) '  fs%n  (z)  dydz  ,  (20) 

{y  >Th  z} 


Integral  (20)  can  be  calculated  by  numerical  methods.  It  can  be 
proved  that  it  does  not  depend  on  the  noise  power,  therefore,  the 
GLRT  has  CFAR  characteristic  with  respect  to  the  noise  power. 
Fig.  1  shows  the  Pfa  curve  versus  the  threshold  Th  obtained 
through  simulations  and  using  the  analytic  approximation  (20).  As 
can  be  appreciated,  the  agreement  is  very  good.  Mismatches  are 
due  to  the  fact  that  expressions  (18)  and  ( 1 9)  are  obtained  by  fitting 
the  lognormal  model  parameters  within  the  range  of  the  studied 
GA  parameters,  and  due  to  the  variance  of  the  Pfa  estimation  by 
Monte  Carlo  analysis. 
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Fig.  1.  Pfa  simulated  and  calculated  through  the  analytic  approx¬ 
imation.  GA  population  size  of  200  and  20  generations.  1000 
Monte  Carlo  trials. 


It  has  been  also  checked  that  TFAD64  and  TFAD512  follow 
the  lognormal  model  in  the  only  noise  case.  This  fact  allows  to  get 
the  threshold  required  for  low  Pfa ,  as  in  the  GAAD  case. 

In  Figs.  2  and  3,  probability  of  detection  (Pd)  curves  of  the 
GLRT  using  GAAD,  TFAD64,  TFAD512  are  depicted  for  Pfa 
equal  to  10“ 2  and  10-6  respectively,  and  a  chirplet  of  features: 
a  =  0.001,  0  =  0.003,  T  =  500,  and  /  =  0.25.  1000  trials 
have  been  employed.  The  greatest  Pfa  has  been  selected  in  order 
to  check  the  agreement  between  our  model  and  simulations.  It  has 
been  seen  that  there  is  no  difference  in  using  the  threshold  from 
the  model  and  from  the  simulations.  The  lowest  Pfa  is  typical  in 
radar  applications,  and  the  lognormal  model  is  required  to  compute 
the  threshold.  The  number  of  samples  (N)  is  1024.  ENP  means 
energy-to-noise  power  ratio,  i.e.  ENR  =  10  log10  (if  /o').  It 
depends  neither  on  the  signal  length,  unlike  the  SNR  in  [1],  nor 
on  other  chirplet  features1. 

The  performance  of  the  energy  detector  (ED),  a  FFT  of  1024 
samples  without  windowing,  and  the  matched  filter  (MF)  are  also 
depicted.  The  ED  and  the  FFT  represent  classic  techniques  in  sig¬ 
nal  detection.  MF  is  the  GLRT  detector  when  the  only  unknown 
features  of  (10)  are  b  and  <j>.  In  the  ED  and  MF  case  it  is  assumed 
known  noise  power3 4.  For  Pd  =  90%,  GAAD  exhibits  better  per¬ 
formance  than  the  TFADs.  This  indicates  that  the  GA  performs 
better  in  the  search  of  the  global  optimum  than  the  algorithm  of 
TFADs.  For  the  lowest  Pfa,  GAAD  is  approx.  4  dB  worse  than 
the  MF  and  the  ED  is  similar  to  the  TFADs,  although  it  does  not 
provide  chirplet  feature  estimation.  Besides,  TFAD512  is  better 
than  TFAD64,  as  expected  due  to  its  greater  resolution.  The  ex¬ 
tremely  poor  performance  of  the  FFT  is  due  to  the  chirplet  chirp 
rate,  i.e.  the  chirplet  bandwidth  is  much  greater  than  the  FFT-based 
filter  bandwidth. 

Pd  depends  mainly  on  the  chirplet  a.  If  a  longer  chirplet  is 
evaluated  (a  =  0.0001,  P  =  0,  T  —  400,  and  /  =  0.25)  the 
sensitivity  is  degraded  as  Fig.  4  shows.  In  this  case,  the  FFT  is 

3SNR.  in  [1]  is  defined  as  SNR  =  10  log10  jfpj.  It  would  be  more 

suitable  to  use  the  chirplet  mean  power,  i.e.  SNR  =  10  log10( b  )■ 

4The  difference  between  known  and  unknown  a2  for  the  GAAD  and 
the  TFADs  has  resulted  in  less  than  0.3  dB  for  a  fixed  Pd  and  1024  sam¬ 
ples. 


Fig.  2.  Performance  of  the  GLRT  using  GAAD,  TFAD64, 
TFAD512.  Also  MF,  ED  and  FFT  detectors  are  depicted.  The 
chirplet  features  are:  q  =  0.001,  p  =  0.003,  T  =  500  and 
/  =  0.25.  Pfa  =  10-2  and  1000  trials. 


Fig.  3.  Performance  of  the  GLRT  using  GAAD,  TFAD64, 
TFADS  12.  Also  MF.  ED  and  FFT  detectors  are  depicted.  The 
chirplet  features  are:  a  =  0.001.  p  =  0.003,  T  —  500  and 
/  =  0.25.  Pfa  =  10-6  and  1000  trials. 


better  than  TFADs,  although  it  does  not  give  estimation  of  a.  This 
is  due  to  the  fact  that  the  chirplet  is  longer  and  does  not  have  mod¬ 
ulation.  It  behaves  as  a  narrowband  signal.  It  has  been  found  that 
the  GAAD  performance  strongly  depends  on  the  P  search  range 
(Table  1).  Using  an  exploration  range  for  p:  [-0.005,  0.005],  the 
Pd  becomes  closer  to  TFADs  behavior.  GAAD  with  this  modifi¬ 
cation  is  called  GAAD2.  For  GAAD2,  the  lognormal  model  is  still 
valid  to  describe  the  noise  statistics.  In  Fig  3,  an  improvement  of 
less  than  1  dB  would  be  obtained  if  GAAD2  were  used  instead  of 
GAAD. 

4.  DETECTION  OF  MULTIPLE  CHIRPLETS 

Using  the  atomic  decomposition,  we  propose  a  sequential  detec¬ 
tor  consisting  of  a  unitary  decision  test,  i.e.  the  one-chirplet-case 
GLRT  (15),  for  every  extracted  atom.  When  an  extracted  atom  has 
an  energy-to-noise  power  ratio  estimate  greater  than  the  thresh- 


176 


Fig.  4.  Performance  of  the  GLRT  using  GAAD,  TFAD64, 
TFAD512.  Also  MF,  ED  and  FFT  detectors  are  depicted  for  a 
longer  chirplet.  GAAD2  means  GAAD  with  a  modified  /3  range. 
The  chirplet  features  are:  a  =  0.0001,  0  =  0.0,  T  =  400  and 
/  =  0.25.  Pfa  =  10~6  and  1000  trials. 


old,  it  is  considered  as  one  of  the  components  forming  the  signal 
under  analysis.  Otherwise,  it  comes  from  noise.  The  threshold 
is  obtained  through  the  expressions  of  the  one  chirplet  case.  As 
an  example,  a  6-chirplet  signal  of  1024  samples  is  analyzed  by 
GAAD.  Chirplets  1  to  4  have  a  =  10-3,  0  =  0 ,  /  =  0.0714, 
and  T  equal  to  150,  350,  600,  and  800,  respectively.  Chirplets 
5  and  6  share  T  =  500  and  /  =  0.25.  In  the  case  of  chirplet 
5,  a  =  10-3  and  0  =  0.003.  For  chirplet  6,  a  =  10~4  and 
(3  =  —0.0001.  All  of  them  have  ENR  =  18  dB s.  Fig.  5  shows 
the  adaptive  spectrogram  (AS)  [3]  obtained  by  GAAD  after  detec¬ 
tion.  The  chirplets  are  rightly  recovered  as  Table  3  shows.  The 
root  mean  square  (RMSE)  of  the  chirplet  parameter  estimation  is 
shown  in  Table  4.  a  of  the  6th  chirplet  is  poorly  estimated  due  to 
its  low  probability  of  detection  at  ENR  =  18  dB  6. 

5.  CONCLUSIONS 

The  theoretical  framework  of  a  detector  using  AD,  which  is  the 
GLRT  in  the  one  chirplet  case,  has  been  proposed,  and  its  per¬ 
formance  has  been  studied  regarding  classic  techniques  and  the 
matched  filter.  Additionally,  the  advantages  of  the  use  of  a  GA 
for  computing  the  AD  have  been  pointed  out  with  regard  to  other 
previous  AD  approaches  in  terms  of  complexity,  efficiency  and  de¬ 
tection  performance.  For  this  algorithm,  a  useful  statistical  model 
of  the  pdf  under  the  only-noise  condition  has  been  presented  al¬ 
lowing  an  approximate  analytic  study  of  the  detector.  The  detector 
dependency  on  the  GA  search  range  and  signal  characteristics  has 
been  shown  through  several  examples. 
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Fig.  5.  AS  of  the  multiple  chirplet  signal  after  GLRT  detector 
using  GAAD. 
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Table  3.  Chirplet  parameters  estimated  by  GAAD.  100  trials. 
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Table  4.  Root  mean  square  error  of  the  chirplet  parameters  esti¬ 
mated  by  GAAD.  100  trials. 
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ABSTRACT 

This  paper  reports  on  the  detection  of  Gaussian  stochastic  tran¬ 
sients  in  multipath  environments  described  by  random  parameters. 
The  solutions  developed  herein  correspond  to  quadratic  proces¬ 
sors,  with  low  computational  cost  and  robust  to  changes  in  the 
statistical  models  of  the  channel.  As  a  consequence,  only  a  small 
amount  of  a-priori  information  is  necessary  to  derive  the  param¬ 
eters  of  the  processor.  A  recursive  form  of  the  processor  is  also 
proposed,  allowing  for  the  recursive  detection  of  the  signal  repli¬ 
cas  arriving  at  the  receiver. 

1.  INTRODUCTION 

The  classical  solution  for  the  detection  of  signals  in  a  multipath 
environment  is  the  generalized  likelihood  ratio  test  (GLRT),  where 
the  likelihood  test  is  computed  from  estimates  of  the  channel  pa¬ 
rameters  in  both  hypotheses.  In  general,  the  estimation  step  in  the 
GLRT  is  a  heavy  computational  procedure.  Furthermore,  when 
the  signals  to  detect  are  stochastic  processes  in  low  signal-to-noise 
ratio  (SNR),  which  is  the  usual  case  in  many  passive  detection 
applications,  the  variances  of  the  estimates  are  large  and  the  detec¬ 
tor  performance  degrades.  Recently,  some  authors  have  developed 
suboptimal  processors  to  avoid  the  drawbacks  of  the  GLRT.  In 
[1],  a  suboptimal  approximation  for  the  detection  of  continuous¬ 
time,  stationary  processes  in  low  SNR  was  proposed,  assuming 
that  the  channel  parameters  are  random  variables.  This  proces¬ 
sor  was  based  on  a  Taylor  series  approximation  of  the  likelihood 
ratio  for  the  processor  with  known  channel  parameters.  In  [2], 
a  geometric  framework  based  on  multiresolution  techniques  was 
proposed,  where  the  set  of  all  possible  signals  arriving  at  the  re¬ 
ceiver  is  approximated  by  a  simpler  linear  subspace.  The  detectors 
proposed  in  [1]  and  [2]  are  both  quadratic  processors. 

This  paper  extends  the  work  of  [1]  to  short-duration  stochas¬ 
tic  transients,  which  are  nonstationary  in  nature.  The  following 
situation  is  considered:  i)  the  processor  is  developed  in  discrete 
time;  ii)  the  multipath  channel  is  regarded  as  a  ’’signal  amplifier”, 
instead  of  a  nuisance;  iii)  the  low  SNR  condition  assumes  that  the 
signal  eigenvalues  are  smaller  than  the  noise  ones.  The  solutions 
proposed  have  two  major  concerns.  First,  they  must  rely  on  a 
small  a-priori  amount  of  statistical  information  about  the  chan¬ 
nel  parameters.  The  basic  idea  is  that  this  information  should  be 
easily  inferred  from  local  data  (i.e.,  depth,  salinity,  temperature 
in  an  underwater  media)  and,  when  the  local  conditions  change, 
the  processor  parameters  must  be  recalculated  fast.  The  simula¬ 
tion  results  show  that  the  processors  are  robust  to  mismatches  on 
the  channel  parameters,  and  thus  only  mild  information  about  the 
range  of  the  delay  coefficients  and  approximate  estimates  of  the  at¬ 
tenuation  coefficient  means  and  variances  are  necessary.  Second, 


the  computational  cost  of  the  resulting  processors  must  be  low, 
allowing  for  real-time  processing.  Two  possible  processing  struc¬ 
tures  are  proposed:  i)  a  quadratic  form,  which  can  be  directly  opti¬ 
mized  in  terms  of  a  performance/computational  complexity  com¬ 
promise  using  the  methods  proposed  in  [3];  ii)  a  recursive  solution, 
where  a  sequence  of  tests  are  performed  at  the  arrival  of  each  signal 
replica.  In  most  situations,  this  scheme  reduces  both  the  compu¬ 
tational  cost  of  the  processor,  and  the  mean  time  interval  between 
the  arrival  of  the  first  replica  at  the  receiver  and  its  detection  time 
instant. 

2.  PROBLEM  FORMULATION 

The  detection  problem  is  formulated  as  a  simple  binary  test.  The 
channel  is  modeled  such  that  the  signal  arriving  at  the  receiver 
under  hypothesis  Hi  is  a  weighted  sum  of  delayed  replicas  of  the 
emitted  signal.  Thus,  the  observation  process  r(t)  is  defined  as 

,  .  _  /  2/(t)  +  n(f),  under  hypothesis  Hi 
r' '  —  n(t),  under  hypothesis  Ho,  ' 

where 

y(t)  =  a!s(t  -  T0) +  '^cxks{t  -  T0  -  Tk)  (2) 

k= 2 

N„ 

S=>  y(t  +  To)  =  ^2aks(t-n),  with  tx  =  0. 

fc=i 

In  (2),  T0.  at,-  and  n- ,  k  =  1,  •  •  • ,  Nq  denote,  respectively,  the  time 
delay  between  the  emission  and  the  reception  of  the  first  replica  of 
a  signal  s(t),  the  attenuation  coefficients  (AC)  and  the  delay  co¬ 
efficients  (DC),  where  Nq  represents  the  number  of  replicas  arriv¬ 
ing  at  the  receiver.  The  emitted  signal,  s(f),  is  a  Gaussian,  zero- 
mean  transient  with  autocorrelation  function  ks(ti,  <2).  The  noise, 
n(f),  is  stationary,  Gaussian  distributed  with  zero-mean  and  with 
constant  power  spectrum  up  to  a  high  frequency.  The  observa¬ 
tion  process  is  conveniently  filtered  and  discretized  at  a  sampling 
frequency  Ts.  For  simplicity,  we  assume  in  the  sequel  that  the 
sampled  noise  sequence  is  white,  and  that  the  DCs,  77,.,  are  integer 
multiples  of  Ts ,  i.e.,  77  =  r/i  T,,  k  =  1,  •  •  •  ,Nq.  The  optimal 
detector  in  this  case  corresponds  to  the  likelihood  test 

Eq.a\pHAr\Huq,a)}  _ 

Eq.ot  \ph0  (r  I  Ho,  q,  a)]  9'a 

(3) 


pii1(r\H1,q,a) 

PH0{r\H0) 


n  1 

Hr, 
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where  q  =  {gi,-  ■  • ,qN, },  a  =  {<*1,  •  •  •  ,ajvJ,  r  =  [r(Ts  + 
To)  •  •  •  r(NTs  +  To)]7  corresponds  to  the  observation  vector  over 
an  interval  of  length  N  and  pH,  ( r\Hj,q ,  a)  represents  the  prob¬ 
ability  density  function  of  r  given  q  and  a,  under  hypothesis  Hi, 
i  =  0, 1.  It  is  assumed  that  the  interval  N  is  large  enough  to  in¬ 
clude  all  the  replicas  of  an  emitted  signal  arriving  at  the  receiver. 


3.  LIKELIHOOD  RATIO  FOR  KNOWN  AC  AND  DC 


Since  y(t)  consists  on  a  sum  of  zero-mean  Gaussian-distributed 
signals,  then  the  joint  probability  density  function  of  r  under  both 
hypothesis  is  also  Gaussian.  Thus,  the  term  inside  brackets  in  (3) 
may  be  rewritten  as 


,/„N  1  (r\Hi,q,a) 

1  '  PH0(r\H0) 


(4) 

where  CM  corresponds  to  the  covariance  matrix  of  dimension 
(N  x  N)  of  r  under  hypothesis  Hi  and  |  ■  |  denotes  the  deter¬ 
minant.  Let  cr2  be  the  variance  of  the  discretized  noise  and  Cy  the 
covariance  matrix  of  the  discrete  received  signal,  y,  under  hypoth¬ 
esis  Hi.  Then,  C h0  =  cr~I  and  Ch1  =  a1 1  +  Cy  ( I  represents 
the  identity  matrix).  Denote  by  Cs  the  (Ni  x  Ni)  covariance 
matrix  of  the  discretized  emitted  transient  signal  s(t),  where  Ni 
represents  the  interval  where  most  of  the  signal  energy  lies.  Con¬ 
sider  the  decomposition  Cs  =  VeDV's,  where  V s  (Ni  x  N\) 
and  D  =  diag{A|,  ■  ■  ■ ,  \nx  }  are,  respectively,  the  eigenvector 
and  eigenvalue  matrices  of  C s.  Under  these  conditions,  we  have 


Cy  =  VyDV' 


with 


and 


Vy  =  J2akVT 


k= 1 


vjf  = 


o(k,Nx) 

Vs 

o(N  -Ni-  k,  Nx)  J 


(5) 

(6) 

(7) 


where  o(n,  m)  stands  for  the  (n  x  m)  null  matrix.  It  is  easy  to 
show  [4]  that  the  likelihood  ratio  (4)  can  be  rewritten  as 


l{r)  =  exp  [1  In  (| I  -  a2V'yVyWy |)  +  \r'VvWyV'yr\  , 

(8) 

with 

Wy  =  D2(UiD2+I)-1/(t2  (9) 

Nq  Nq 

Ui  =  Y^Y^akai(VksyVls  (10) 

k=l  1= l 
l^k 

D2  =  {cil  +  a2  D-1)-1 ,  (11) 

where  ci  =  £k=i  al 


Remark  that  the  optimal  detector  corresponds  to  taking  the  ex¬ 
pected  value  in  order  to  the  ACs  and  DCs  of  (8).  However,  the 
resulting  processor  has  no  closed-form  expression  and  requires  a 
huge  computational  load.  To  overcome  this  inconvenient,  we  sim¬ 
plify  (8)  by  taking  the  terms  up  to  the  first  order  of  its  Taylor  series 
around  a  convenient  working  point.  Under  the  conditions  that  the 


SNR  is  low  (i.e.,  Xk  <  a2,  Vfc  =  1,  •  •  • ,  N\)  and  the  multipath 
channel  amplifies  the  signal  energy  arriving  at  the  receiver  (i.e., 
ci  >  1),  then  the  elements  of  the  diagonal  of  the  matrix  D2  are 
small.  Thus,  the  first-order  approximation  of  the  likelihood  ratio 
around  the  point  D2  =  o(N\,  Nx)  is 

l(r)  ~  1  -  ifr  { VyD2V'y }  +  ±^r'VyD2V'yr  =  tt (r). 

(12) 

When  the  DCs  and  the  ACs  are  known,  tt (r)  still  represents  the 
optimum  solution  when  there  is  no  overlapping  between  the  repli¬ 
cas  arriving  at  the  receiver,  since  Ui  =  o(Nx,Nx).  This  situation 
corresponds  to  the  case  where  the  duration  of  the  emitted  signal  is 
small  comparing  to  the  channel  delays.  However,  the  examples 
presented  in  [4]  show  that,  even  when  there  is  a  large  overlapping 
between  replicas,  the  power  of  the  signal  arriving  at  the  receiver 
increases  and  the  performance  of  the  processor  does  not  degrade 
significantly. 

In  the  next  section  the  suboptimal  processor  is  derived  by  tak¬ 
ing  the  expected  value  in  order  to  the  ACs,  a,  and  DCs,  q ,  of  the 
approximated  likelihood  ratio  n(r)  (12). 


4.  EXPECTED  VALUE  OF  n(r) 


In  the  sequel,  the  elements  of  {qi,  •  •  • ,  ajv?,gi,  •  •  • ,  gjvq}  are  as¬ 
sumed  to  be  mutually  independent.  By  taking  the  expected  value 
in  order  to  the  ACs,  a,  one  gets 

Ea  Mr)]  =  l+^r'Ea  [VyD^r-^triEcx  [VvD2V'y]}, 

(13) 

where  tr{X}  denotes  the  trace  of  X.  Since  D2  depends  on  a 
through  ci,  the  expected  value  in  (13)  should  be  evaluated  using 
the  joint  probability  density  function  of  a.  However,  if  we  assume 
that  ci  =  cl  is  approximately  constant,  it  is  only  necessary  to 
know  the  first  and  second  order  statistics  of  a,  i.e., 

Ng  Nq 

Ea[VyD2V'y }  Cakiak2V9sk'D2(y?*y,  (14) 

ki  =  l  /c2 — 1 

where  Caklak  =  E[akl  cit,]  represents  the  crosscorrelation  be¬ 
tween  akl  and  ak2 .  The  constant  c\  is  for  now  left  as  a  free  pa¬ 
rameter  that  will  be  chosen  in  order  to  maximize  the  performance 
of  the  final  expression  of  the  processor. 

The  final  expression  of  the  likelihood  test  is  obtained  taking 
the  expected  value  of  (13)with  respect  to  the  DCs,  q.  i.e.. 

Hi 

Mr)  <  A4)  (15) 

Ho 

where  the  threshold  p  includes  the  terms  of  (13)  that  do  not  depend 
on  the  observation  process  r,  and 


Mr)  =  r'\Yl  \r. 

Ul=lfc2=l  J 

(16) 

When  k  =  k\  =  k2  we  have 


Ck  =  Eqk  [VTD2(Vf)']  =  Y,  V?D2(V?yPk(m ), 

m= 1 

(17) 
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where  Pk{m)  denotes  the  probability  function  of  and  for  k\ 

1x2-, 

Eqkligka  [vT1D2(V?n']  =  VklD2V'k2,  (18) 

with 


Vk  =  VTJUm). 


(19) 


[0[  +l-,0[+N1}x  [0[ + 1 ;  9{ +  Ni }  and  [0[ + 1 ;  $[ + Nk ]  x  [1;  Nk] . 
Defining  Nqk  =  0[  -  6'k  +  Ni,  we  denote  by  C'k  (N,/k  x  N,n ) 
and  V’k  (Nqk  x  N\)  the  matrices  that  include  only  the  elements 
belonging  to  the  support  of  Ck  and  V k.  If.  at  each  time  instant  n, 
the  vector  of  the  last  Ay, ,  elements  of  the  observation  process  is 
represented  by  rk(n)  =  [r(n  —  N,lk  +  1)  •  ■  •  r(n)]',  then  expres¬ 
sion  (20)  may  be  rewritten  as 


The  probability  Pk{m)  related  to  the  DC  qi, ,  corresponds  to  an 
uncertainty  measure  of  the  time  interval  between  the  arrival  of  the 
first  and  the  fc-th  replica  at  the  receiver.  Thus,  as  noted  before, 
qi  =  0,  and  Pi  (m)  =  <5„,  (the  kronecker  delta).  The  quadratic 
form  of  the  likelihood  ratio  is  given  by 


*p(r)  =  r'< 


E c*kCk+'E  E  0‘k1Qk2  VklDoV  L 


A.’i  —  1  fco— 1 
ki^k? 


(20) 


where  C„k  =  Cakck  =  ak  +  a2k  (a>,  and  <rj?  being,  respectively, 
the  mean  and  variance  of  ak). 

In  (20),  matrices  Ck  and  D2  depend  on  the  constant  c , ,  that 
is  tuned  in  order  to  maximize  the  detector  performance,  using  the 
following  procedure:  i)  for  each  possible  value  of  c{ ,  let  M  be  the 
matrix  inside  brackets  in  (20),  and  determine  the  covariance  ma¬ 
trix  C‘H]  that  corresponds  to  a  hypothesis  for  which  the  processor 
(20)  is  optimal,  i.e.,  M  =  -  (CJ^)-1  ( C hq  =  crl)\  ii) 

choose  the  value  of  Ci  that  maximizes  either  the  Chemoff  or  the 
Bhattacharyya  distances  [5]  between  C’Hl  and  C n0  ■  This  pro¬ 
cedure  corresponds  to  minimizing  a  bound  for  the  probability  of 
error  between  both  hypotheses  and  is  computationally  efficient. 

The  processor  presented  in  (20)  is  a  quadratic  form.  Its  com¬ 
putational  cost  is  relatively  low,  comparing  with  the  GLRT,  be¬ 
cause  the  matrix  M  can  be  computed  off-line  when  the  condi¬ 
tions  of  the  multipath  channel  change.  Although  the  processor  de¬ 
pends  on  the  probability  functions  of  the  delay  coefficients  which 
may  not  be  easy  to  determine,  the  simulation  studies  show  that  the 
receiver  performance  is  robust  to  mismatches  on  these  probabili¬ 
ties.  Thus,  when  the  true  Pk(m )  are  unknown,  the  processor  is 
designed  assuming  uniform  probabilities  and,  in  general,  the  re¬ 
sulting  performance  degradation  is  not  important. 


5.  SEQUENTIAL  DETECTION  OF  REPLICAS 

In  (20),  the  length  N  of  the  observation  vector  r  must  be  large 
enough  to  include  all  replicas  arriving  at  the  receiver.  Therefore,  it 
is  only  possible  to  detect  a  signal  when  its  last  replica  arrives.  This 
section  develops  a  sequential  structure  based  on  (20)  that  allows 
the  detection  of  a  signal  without  the  need  of  waiting  for  all  repli¬ 
cas.  In  this  case,  the  detection  is  performed  as  soon  as  an  arbitrary 
number  of  replicas  possess  a  sufficient  amount  of  energy  to  ensure 
with  high  probability  that  a  signal  was  emitted  by  the  source.  Fur¬ 
thermore,  in  many  situations,  the  sequential  detection  of  replicas 
may  reduce  the  computational  cost  of  the  processor. 

In  (20),  matrices  Ck  and  V k  have  dimensions  ( N  x  N)  and 
( N  x  N\).  However,  the  dimension  of  the  non-null  terms  of  each 
of  these  matrices  is  significantly  lower  than  N  due  either  to  the 
transient  characteristic  of  the  emitted  signal  and  also  because  it  is 
assumed  that  the  probabilities  Pk(m)  associated  to  each  replica 
have  compact  support.  Let  [0[.;  &[]  be  the  support  of  Pk(m )  (with 
6\  =  0);  then,  the  matrices  Ci.  and  V k  have,  respectively,  support 


Nq 

7Tp(r)  =  y>i!  +  (Tk)tk{0l  +  Ni) 

fc= i 


+2^^  aklak2b'kl(9fki  +  N1)D2bk2(efk2  +  Nk), 

ki—2  k2  =  1 


(21) 


where 


Ifc(n)  =  r'k(n)ZkDlZ'krk{n)  and  bk(n)  =  (V k)  rk(n). 

(22) 

In  (22).  Zk  and  D[  represent,  respectively,  the  matrices  of  eigen¬ 
vectors  and  eigenvalues  of  Ck  (C\,  =  ZkDkZ'k).  Remark  that, 
due  to  the  fact  that  q\  =  0  and  P\{m)  =  5,„,  then  Z\  =  V„  and 
D\  =  D2.  Under  these  conditions,  the  likelihood  ratio  may  be 
rewritten  in  a  recursive  form: 


7T/,(n)  =  7 —  Of,  +  +  (a],  +  crf,)li,(n) 


+  2q;, 


^2&kb'k(n-el  +e[) 


k= 1 


D2b,,{n),(  23) 


with  7 r:(n)  =  (q?  +  cr2)h(n).  For  h  =  N„,  nNq(n)  corresponds 
to  expression  (20). 

Comparing  expressions  (20)  and  (23)  from  a  computational 
point  of  view,  the  solution  that  leads  to  a  less  expensive  processor 
depends  both  on  the  signals  to  detect  and  on  the  multipath  chan¬ 
nel  structure.  In  general,  however,  two  situations  may  arise.  First, 
when  there  is  a  large  overlapping  between  replicas,  using  the  re¬ 
cursive  processor  (23),  it  is  necessary  to  perform  Nq  eigenvector 
decompositions  of  length  Nqk ,  while  the  non-recursive  form  (20) 
only  needs  one  eigenvector  decomposition  of  length  N.  Since,  in 
this  case.  N  <C  J2k  Nqk ,  we  should  expect  that  the  non-recursive 
solution  would  be  less  expensive  than  the  recursive  one.  However, 
this  may  not  be  true  because,  in  most  cases,  the  number  of  relevant 
eigenvalues  is  much  more  important  in  the  non-recursive  proces¬ 
sor,  thus  increasing  its  computational  cost.  When  the  overlapping 
is  smaller,  then  clearly  the  recursive  solution  becomes  more  attrac¬ 
tive  than  the  non-recursive  one. 

From  the  recursive  processor  (23),  it  is  possible  to  derive  a 
detection  structure  suited  for  sequential  detection  of  replicas.  For 
every  h  =  1,  •  ■  • ,  Nq- 1,  two  likelihood  tests  are  performed.  The 
first  test  compares  nh(n)  with  a  high-valued  threshold,  if 
7 Th(n)  >  then  it  is  assumed  that  a  signal  arrived  to  the 

receiver  with  a  low  probability  of  false  alarm,  and  there  is  no 
need  to  evaluate  ~hki(n  +  6}h+^  -  6}h),  that  corresponds  to  the 
arrival  of  the  next  replica;  if  nk(n)  <  then  another  test 

is  performed  against  a  low  threshold  ;  if  7r/,(n)  <  n™*, 
we  consider  that  no  signal  is  present  at  the  receiver  (with  low 
miss  probability)  and  the  procedure  stops.  Only  in  the  case  where 
<  7TiJn)  <  the  processor  waits  for  the  next  replica  to 
make  a  decision.  When  h  =  Nq.  only  one  final  test  is  performed. 
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6.  SIMULATION  RESULTS 

The  received  signal  is  a  weighted  sum  of  10  delayed  replicas  of  a 
chirplike  stochastic  transient  with  autocorrelation  function  shown 
in  figure  1.  The  noise  variance  is  <r2  =  5.  The  mean  value  for  the 
overlapping  between  consecutive  replicas  A,,  is  80%  or  20%. 


Fig.  1.  Autocorrelation  function  of  the  emitted  signal. 


Fig.  2.  ROCs:  a)  A,  =  80%.  b)  A,  =  20%. 


Fig.  3.  Statistics  mismatch:  a)  A,  =  80%.  b)  A,  =  20%. 


The  DCs  are  generated  from  a  Gaussian  probability  function 
with  l/3rd  of  the  length  of  the  emitted  signal,  while  the  ACs  have 
&k  =  1  and  <7fc  =  0.2,  Vic  =  1,  •  •  • ,  10.  The  receiver  operat¬ 
ing  characteristics  (ROC)  are  obtained  by  simulation  with  5000 
Monte-Carlo  runs.  In  figures  2  a)  and  b),  the  performance  of  the 
proposed  detector  (PROPOSED)  is  compared  with  i)  a  reference 
one  (REFERENCE),  consisting  on  the  best  possible  quadratic  pro¬ 
cessor  obtained  from  the  covariance  matrix  estimated  from  the 
5000  signals  arriving  at  the  receiver;  and  ii)  a  detector  that  as¬ 
sumes  that  the  DCs  and  ACs  take  fixed  values  (FIXED),  equal  to 
the  mean  values  of  the  real  channel  parameters.  We  conclude  that, 
for  both  overlapping  situations,  the  performance  of  the  proposed 
solution  is  very  close  to  the  best  quadratic  processor  and  presents 


Table  1.  Sequential  detection  of  replicas 


Scenario 

PFA 

PD 

CC 

i) 

0.1810 

0.7102 

18.16% 

ii) 

0.1064 

0.8374 

65.31% 

iii) 

0.1246 

0.8140 

41.59% 

a  huge  gain  comparing  to  the  fixed  parameters  one.  Figures  3  a) 
and  b)  show  the  performance  degradation  due  to  mismatch  in  the 
statistics  of  the  DCs.  For  FP  =  0  the  DCs  probability  function 
is  known,  while  for  FP  =  1  an  uniform  probability  with  the  sup¬ 
port  of  the  true  one  is  used.  When  SH  =  5,  a  shift  (25  %)  of  the 
support  of  the  DCs  probability  function  is  considered  in  the  tem¬ 
poral  localization  of  the  DCs.  If  SH  =  0,  no  mismatch  exists.  For 
a  large  overlapping  between  replicas  ( Ag  =  80%),  the  proposed 
processor  shows  to  be  robust  to  statistics  mismatch.  However,  for 
a  smaller  overlapping,  figure  3  b)  shows  some  sensitivity  to  shifts 
mismatch.  In  both  cases,  the  assumption  of  an  uniform  probability 
function  introduces  only  a  small  degradation  on  the  processor. 

Regarding  the  sequential  detection  of  replicas,  and  for  mean 
overlapping  between  replicas  of  50%,  three  scenarios  are  consid¬ 
ered:  i)  small  psnp  and  large  pln^ ;  ii)  the  opposite  of  i);  and  iii) 
an  intermediate  situation,  between  i)  and  ii).  Table  1  shows  the 
probabilities  of  detection  (PD)  and  of  false  alarm  (PFA),  and  the 
percentage  of  computational  complexity  (CC)  needed,  comparing 
with  the  processor  that  waits  for  the  arrival  of  all  the  10  replicas, 
for  which  PD  =  0.8442  and  PFA  =  0.1.  In  situation  i)  the  CC  re¬ 
duces  drastically  but  the  PD  and  the  PFA  also  suffer  an  important 
degradation.  In  this  case,  detection  is  performed  at  the  arrival  of 
few  replicas.  In  situation  ii),  although  there  is  only  a  small  loss  of 
performance,  the  CC  is  still  reduced  to  65.31%. 

7.  CONCLUSION 

The  optimal  processor  for  stochastic  transient  signal  detection  in  a 
multipath  environments  is,  in  general,  computationally  untractable. 
This  paper  presents  a  computationally  efficient  suboptimal  solu¬ 
tion,  where  the  multipath  parameters  are  modelled  as  random  vari¬ 
ables.  A  structure  for  sequential  detection  of  replicas  is  proposed, 
avoiding  the  need  to  wait  for  all  replicas  to  make  a  decision.  The 
proposed  solution  is  robust  to  the  multipath  statistics  mismatch, 
thus  requiring  only  mild  a-priori  information  about  the  channel. 
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ABSTRACT 

Two  detectors  of  symmetrically  distributed  independent  tim¬ 
ing  jitter  in  a  data  record  composed  of  a  complex  harmonic 
in  additive  white  Gaussian  noise  are  proposed.  The  pro¬ 
posed  detectors  are  computationally  efficient  and,  although 
they  are  formulated  using  asymptotic  results,  they  may  be 
effectively  used  with  small  sample  lengths  under  a  wide 
range  of  conditions.  The  performances  of  the  detectors  are 
analysed  using  simulations  and  theoretical  results. 

1.  INTRODUCTION 

Nearly  all  sampling  systems  exhibit  timing  jitter  wherein 
the  spacing  between  sampling  instants  is  not  uniform  but 
varies  about  the  nominal  sampling  period  in  a  random  fash¬ 
ion.  In  most  cases  the  deviations  from  the  nominal  sampling 
period  are  small  enough  that  they  may  be  ignored.  However, 
there  are  cases  in  which  the  effects  of  jitter  can  become  sig¬ 
nificant  [1,  2],  The  signal  model  considered  in  this  paper 
is  a  complex  harmonic  in  additive  noise  which  may  or  may 
not  have  randomly  spaced  sampling  instants.  Specifically, 
we  consider  observations  from 

A,  =  g0  exp[j{w0(f  +  U,)  +  t/’}]  +  Wt,  t  £  Z,  (1) 

where  go  £  IR+ ,  —  tt  <  ojq  <  7r,  wo  0,  — 7r  <  <  it, 

Uut  £  Z  are  zero-mean  real- valued  independent  and  iden¬ 
tically  distributed  (iid)  random  variables  with  variance  of, 
and  {Wi}  is  a  circular  complex- valued  white  normal  ran¬ 
dom  process  with  variance  independent  of  {Ut}.  It  is 
assumed  that  {Uf}  is  symmetrically  distributed  and  has  a 
characteristic  function  <f>u{s)  =  E  exp (jsUt)  which  con¬ 
verges  for  s  =  kujQ ,  k  =  1,...,4.  We  aim  to  provide 
a  solution  to  the  following  problem:  Given  observations 
aio,  ■  ■  • ,  arn_i  from  (1)  test  the  hypothesis  H  :  ay  =  0 
against  the  alternative  K  :  a\j  >  0.  Under  this  formula¬ 
tion,  a  decision  for  the  alternative  indicates  the  presence  of 
timing  jitter.  It  is  assumed  that  all  signal  and  noise  param¬ 
eters  and  the  values  uq,  ■  •  • ,  «n-i  of  the  timing  offsets  are 
unknown.  A  method  of  checking  for  the  presence  of  timing 


jitter  is  desirable  as  a  diagnostic  tool  for  evaluating  the  per¬ 
formance  of  sampling  systems.  A  jitter  detector  can  also  be 
used  to  select  estimation  procedures  since  the  optimal  esti¬ 
mation  procedures  will  differ  depending  on  whether  or  not 
jitter  exists. 

Previous  work  on  detecting  timing  jitter  has  been  per¬ 
formed  by  Sharfer  and  Messer  [6.  7].  In  that  case  the  pres¬ 
ence  of  jitter  in  the  sampling  of  a  band-limited  zero-mean 
stationary  random  process  with  non-zero  third-order  cumu- 
lants  was  found  to  be  characterised  by  non-nullity  of  the 
bispectrum  in  a  certain  region.  This  observation  precipi¬ 
tated  the  formulation  of  an  appropriate  statistical  test.  The 
bispectrum  jitter  detector  cannot  be  used  for  the  problem 
considered  here  because  the  random  process  {A',}  of  (1) 
does  not  satisfy  the  required  assumptions.  No  jitter  detec¬ 
tors  for  the  signal  model  considered  here  have  been  pro¬ 
posed  in  the  literature.  In  this  paper  two  test  statistics  based 
on  estimators  of  wgof,  are  proposed.  Starting  from  the  same 
result,  the  test  statistics  are  derived  using  different  assump¬ 
tions.  In  the  first  case  the  jitter  is  assumed  to  be  normally 
distributed  while  in  the  second  case  a  small  jitter  approxi¬ 
mation  is  used.  The  resulting  detectors  are  computationally 
efficient  and  maintain  the  nominal  false  alarm  probability, 
although,  for  small  sample  lengths,  mild  conditions  on  the 
signal-to-noise  ratio  are  required  for  the  normal  assumption 
detector.  The  effects  of  the  assumptions  made  in  the  formu¬ 
lation  of  the  two  detectors  are  studied  using  theoretical  and 
simulation  results. 

2.  PROPOSED  METHODS 

In  this  section  two  detectors  of  independent  timing  jitter  are 
proposed.  The  conditions  under  which  the  detectors  are 
consistent  are  established  and  examined  for  various  jitter 
distributions. 

After  estimating  the  frequency  Wo  and  initial  phase  ?/-’  as 


w0  =  arg  max  |d.v(w)  , 

—  7T<U7<7T 

(2) 

tj>  =  Zdx(wo), 

(3) 

0-7803-701 1-2/01/$10.00  ©2001  IEEE 
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where  dx  (w)  is  the  finite  Fourier  transform  (FT)  of  the  se¬ 
quence  x0, . . . ,  Xn-i, 

n—1 

dx{u)  =  xt  exp  (-jut), 

4=0 

the  observations  are  demodulated  to  form  the  sequence 

Vt  =  xtexp{-j(tj0t  +  ip)},  t  -  0, ...,n  -  1. 

Although  the  estimators  Cj o  and  tp  are  optimal  only  if  no 
timing  jitter  exists,  it  has  been  shown  in  [3]  that,  even  for 
large  values  of  crjj,  these  estimators  have  variances  close  to 
the  Cramer-Rao  bound  for  the  case  of  normally  distributed 
jitter.  Under  the  given  assumptions  it  is  shown  in  [3]  that 
wo  =  <^o  +  Om(n-3/2)  and  ip  =  ip  +  Om(n~xl 2)  so  that 

Yt  =  g0  exp (j£t)  +  Vt  +  Om(n_1/2),  (4) 

where  =  wq U t  and  Vt  =  Wf  /  exp{j(wot  +  ip)}.  Since  it 
is  assumed  that  u>o  ^  0,  the  variance  cr|  of  will  be  zero 
only  if  tjf  r  =  0.  Therefore,  non-nulhty  of  <r|  can  be  used  to 
test  for  the  presence  of  jitter. 

In  the  following,  terms  in  (4)  which  disappear  as  the 
sample  length  n  ->  oo  will  be  ignored.  Such  terms  do  not 
affect  the  asymptotic  analysis.  Let  rkY  =  ERe(Ft)fc  and 
ikY  =  Elm(F<)*,  k  =  1,2, —  It  is  straightforward  to 
show  that 

\Zr2y  -  hy  j rY  =  1/^(2)/ ^(1),  (5) 

where  ^(s)  =  E  exp(jsfi)  is  the  characteristic  function  of 
{ft}.  Eq.  (5)  may  be  used  as  the  basis  of  an  asymptotically 
unbiased  estimator  of  <r|,  provided  that  knowledge  of  the 
distribution  of  the  jitter  is  available.  Since  the  jitter  distribu¬ 
tion  is  assumed  to  be  unknown,  an  estimator  of  <r|  which  is, 
in  general,  unbiased  cannot  be  obtained  from  (5).  However, 
since  we  are  concerned  only  with  jitter  detection,  an  unbi¬ 
ased  estimator  of  jitter  variance  is  not  a  necessity.  With  this 
in  mind,  one  approach  is  to  assume  a  distribution  for  the  jit¬ 
ter  and  derive  an  estimator  of  ct|  on  this  basis.  Another  ap¬ 
proach  is  to  assume  small  amounts  of  jitter  and  replace  the 
characteristic  function  <p^(s)  by  its  second-order  approxi¬ 
mation.  A  distribution-independent  estimator  can  then  be 
derived.  Both  of  these  approaches  require  that  the  perfor¬ 
mances  of  the  resulting  detectors  are  carefully  studied  for  a 
range  of  jitter  distributions.  This  analysis  will  be  performed 
in  Section  3. 

2.1.  Normal  Assumption 

We  will  proceed  from  (5)  as  if  the  jitter  is  normally  dis¬ 
tributed.  The  normal  distribution  is  chosen  for  the  simple 


derivation  it  affords.  Importantly,  this  choice  does  not  pro¬ 
hibit  the  use  of  the  proposed  detector  for  other  jitter  distri¬ 
butions,  as  will  be  demonstrated  in  Section  3.  Substituting 

<p£ (s)  =  exp(-s2cr| /2)  into  (5)  gives 

exp(— o-f/2)  =  vV2y  -  hv / rY 

Simple  re-arrangements  result  in  the  following 

=  2  log (ry/Vr2y  -  i2y) 

Since  the  moments  of  the  real  and  imaginary  parts  of  Ft  are 
unavailable,  they  are  replaced  by  their  sample  estimators, 

n—1 

hY  =  l/n£Re(rt)*,  (6) 

t- 0 
n—1 

*V  =  l/n£lm(F()*,  (7) 

4=0 

to  form  the  estimator 

=  2  log  ( ry/VhY  -*2r) 

Under  H,  y/nd^  ~  N(0,  rrfv/g}).  Standardisation  leads  to 

the  statistic  T  =  \Znfya^/( 2i2y)  which  is  asymptotically 
standard  normal  under  H. 

2.2.  Small  Jitter  Approximation 

Under  the  small  jitter  approximation  and  assuming  symmet¬ 
rically  distributed  jitter,  <p^(s)  =  1  -  s2cr2/2  is  substituted 
into  (5)  giving 

Vr2Y  -  i2Y / ry  «  yjl  -  2cr|  j  (l  -  crf/2)  , 

*  (1  -  4)/  C1  -  *1/2) ,  (8) 

where  the  second  fine  is  obtained  by  replacing  the  square 
root  on  the  right  hand  side  with  its  first-order  Taylor  se¬ 
ries  approximation.  After  replacing  the  moments  with  their 
sample  estimators  and  performing  some  simple  manipula¬ 
tions  the  following  estimator  of  <r|  is  obtained: 

df  =  (ty  -  \/ f  2Y  -  hy  j  j  (ry  -  \Jf-jy  —  *2y/2^  . 

Although  ct|  is,  in  general,  a  biased  estimator  of  tr|,  it  is 
asymptotically  unbiased  when  a'fj  =  0,  i.e.  E  d|  =  0  under 
H.  Using  theorems  given  in  [5]  it  can  be  shown  that,  under 
H,  sjnd |  ~  N(0.  o\v Iqq).  It  is  not  surprising  to  see  that 
the  asymptotic  null  distribution  of  is  the  same  as  that  of 
d|.  The  statistic  T  =  V^^yd|/(242y)  is  asymptotically 
standard  normal  under  H. 
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Jitter  detectors  based  on  the  statistics  T  and  T  will  be 
developed  using  asymptotic  results.  Since  the  test  statistics 
are  asymptotically  standard  normal  under  H.  the  null  hy¬ 
pothesis  is  rejected,  i.e.  it  is  decided  that  jitter  is  present,  if 
the  test  statistic  exceeds  <i>-1  (1  -  a)  where  $(■)  is  the  stan¬ 
dard  normal  distribution  function  and  a  is  the  prescribed 
false  alarm  probability.  The  asymptotic  distributions  of  T 
and  T  are  derived  in  [4]  under  the  alternative.  It  is  shown 
that,  for  the  mean  values  of  the  detectors  to  be  real-valued 
it  is  required  that  r2y  -  hr  >  0  which  corresponds  to 
</)£  (2)  >  0.  In  practice,  reliable  use  of  the  detectors  requires 
that  (j)£  (2)  is  significantly  non-zero  so  that  the  probability  of 
f-2Y  —  *2>'  <  0  occurring  for  a  particular  realisation  is  small. 
This  issue  is  considered  in  [4], 

2.3.  Consistency 

The  detectors  will  be  consistent,  i.e.,  the  detection  proba¬ 
bility  tends  to  one  as  n  -»  oo  for  fixed  parameter  values, 
if 

MV  >  (2)  >  0.  (9) 

The  inequality  (9)  holds  for  normally  distributed  jitter  but 
not  necessarily  for  uniformly  distributed  jitter.  In  particular, 
if  Ut  ~  U[— a,  a],  (9)  becomes  tan(|wo|fl)  >  |w0|o  with 
| w0 1 a  €  27 rk  +  (0,7t/2),  k  =  0, 1, . . ..  Of  the  values  of 
a  which  satisfy  this  inequality  for  a  given  frequency  w0,  of 
significant  interest  are  the  subset  a  <  7r/(2|wo|).  Other 
values  of  a  which  satisfy  the  required  inequality  will  be  too 
large  to  be  of  practical  interest.  In  the  most  restrictive  case, 

|  w0 1  =  7r  and  the  requirement  for  consistency  becomes  a  < 
1/2.  This  is  not  a  strict  condition  since  it  allows  values 
of  a  up  to  the  point  at  which  time-reversals  are  possible. 
Note  that  if  |w0|  <  tt,  it  is  possible  to  consistently  detect 
uniformly  distributed  jitter  even  if  time-reversals  occur  with 
non-zero  probability. 

Finally  it  is  noted  that,  in  addition  to  (9),  the  jitter  detec¬ 
tor  derived  under  the  small  jitter  approximation  is  consistent 
if,  for  <^(2)  >  0, 

fc(l)  <  0  1J  {^(1)  >  0f| 2-^(1)  <  V^(2)}  • 

Therefore  the  jitter  detector  based  on  the  small  jitter  approx¬ 
imation  is  consistent  under  a  wider  range  of  conditions  than 
the  jitter  detector  based  on  the  normal  assumption. 

3.  PERFORMANCE  ANALYSIS 

This  section  contains  a  performance  analysis  and  compari¬ 
son  of  the  proposed  jitter  detectors.  We  first  verify  the  abil¬ 
ity  of  the  detectors  to  maintain  the  prescribed  false  alarm 
probability  for  finite  data  records.  In  the  second  part  of  this 
section  the  performances  of  the  detectors  are  analysed  for 
normal  and  uniform  jitter  distributions. 


3.1.  False  Alarm  Probability 

The  false  alarm  probabilities  of  the  jitter  detectors  are  esti¬ 
mated  for  various  sample  lengths  and  nominal  false  alaim 
probabilities  of  1  %  and  5  %.  The  frequency  w0  =  1,  the 
initial  phase  r}>  =  1/2  and  the  signal-to-noise  ratio  (SNR), 
defined  as  S  =  gl/a'\V,  is  set  to  0  dB.  For  each  scenario, 
50  000  realisations  of  (1)  are  generated  under  the  null  hy¬ 
pothesis.  The  results,  shown  in  Table  1 ,  indicate  that  both 
detectors  maintain  the  nominal  false  alarm  probability  in  all 
cases.  Although  both  detectors  are  excessively  conservative 
for  small  sample  lengths,  it  can  be  seen  that  the  actual  false 
alarm  probabilities  approach  the  nominal  false  alarm  prob¬ 
abilities  as  the  sample  length  increases. 

Table  1:  Estimated  false  alarm  probabilities  for  jitter  detec¬ 
tors  based  on  the  small  jitter  approximation  (left)  and  the 
normal  assumption  (right).  The  SNR  5  =  0  dB. 


log-2  (”) 

a  (%) 

1 

5 

5 

0 

0.47 

0.25 

2.95 

6 

0.01 

0.42 

1.08 

3.23 

7 

0.06 

0.50 

1.97 

3.70 

8 

0.19 

0.60 

2.84 

4.08 

9 

0.34 

0.71 

3.27 

4.22 

10 

0.59 

0.88 

3.77 

4.48 

11 

0.62 

0.82 

4.14 

4.61 

An  issue  of  some  importance  is  the  effect  of  SNR  on 
the  false  alarm  probabilities.  Simulation  results  given  in  [4] 
show  that,  for  a  given  sample  length,  the  actual  false  alarm 
probability  of  the  detector  based  on  the  normal  assumption 
increases  as  5  decreases.  In  fact  the  false  alarm  probabil¬ 
ity  exceeds  the  set  level  for  SNRs  below  -3  dB  although 
this  exceedance  becomes  smaller  as  the  sample  length  in¬ 
creases.  The  false  alarm  probability  of  the  detector  based 
on  the  small  jitter  approximation  is  less  than  the  set  level 
in  all  cases,  but  does  exhibit  a  slight  increase  in  false  alarm 
probability  as  5  increases. 

3.2.  Detection  Probability 

The  power  of  the  two  detectors  is  now  examined  using  the¬ 
oretical  and  simulation  results.  In  the  first  example  the  jitter 
is  normally  distributed.  The  frequency  wo  =  3/4,  the  ini¬ 
tial  phase  i}j  =  1/2  and  the  sample  length  n  =  512.  The 
variance  of  the  jitter  is  varied  between  10-3  and  10° 
and  the  SNR  5  is  varied  between  0  dB  and  10  dB.  In  all 
of  these  cases  the  probabilities  of  the  test  statistics  being 
complex-valued  are  negligible.  Simulation  results,  obtained 
using  1000  realisations  of  (1)  for  each  scenario,  are  shown 


184 


in  Figure  1  for  a  =  0.01.  The  simulation  results  are  accom¬ 
panied  by  theoretical  results  derived  in  [4]. 


Table  2:  Legend  for  Figures  1  and  2. 


<S(dB) 

Empirical  Theoretical 
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Figure  1:  Detection  probability  (in  %)  of  the  jitter  detec¬ 
tors  based  on  a)  the  small  jitter  approximation  and  b)  the 
normal  assumption  in  the  presence  of  normally  distributed 
jitter.  The  false  alarm  probabilities  is  1  %.  The  legend  is 
given  in  Table  2. 

It  can  be  seen  that  the  detection  probabilities  are  high  for 
small  values  of  af,  when  S  =  5, 10  dB.  Notably,  there  is  lit¬ 
tle  difference  between  the  performances  of  the  detectors  for 
these  values  of  SNR.  However,  for  S  =  0  dB  it  is  clear  that 
the  detector  based  on  the  normal  assumption  outperforms 
the  detector  based  on  the  small  jitter  approximation.  An¬ 
other  results  of  interest  is  that,  for  S  =  0  dB,  the  detection 
probability  does  not  tend  to  one  as  erf,  increases.  In  fact,  if 
erf,  is  increased  beyond  one,  the  detection  probability  actu¬ 
ally  decreases.  This  effect  can  be  attributed  to  an  increase  in 
the  variances  of  the  test  statistics  accompanied  by  decreases 
in  the  mean.  Additional  simulations,  not  shown  here,  ver¬ 
ify  that  for  a  given  SNR  this  effect  disappears  as  the  sample 
length  n  increases. 

The  above  experiments  are  repeated  for  uniformly  dis¬ 
tributed  jitter  with  the  results  shown  in  Figure  2  for  a  = 
0.01.  For  a  given  jitter  variance,  it  can  be  seen  that  the  jitter 
detectors  perform  better  for  uniformly  distributed  jitter  as 
compared  to  normally  distributed  jitter.  This  is  particularly 
so  for  S  =  0  dB. 

4.  CONCLUSIONS 

Two  closely-related  methods  were  proposed  for  detecting 
the  presence  of  symmetrically  distributed  independent  tim¬ 
ing  jitter  in  a  complex  harmonic.  One  detector  was  obtained 


Figure  2:  Detection  probability  (in  %)  of  the  jitter  detectors 
based  on  a)  the  small  jitter  approximation  and  b)  the  normal 
assumption  in  the  presence  of  uniformly  distributed  jitter. 
The  false  alarm  probabilities  is  1  %.  The  legend  is  given  in 
Table  2. 

by  employing  a  small  jitter  approximation  while  the  other 
was  obtained  through  a  normal  assumption.  The  conditions 
required  for  consistency  of  the  detectors  were  derived.  Us¬ 
ing  these  results  it  was  shown  that  the  detectors  are  consis¬ 
tent  for  the  important  special  cases  of  normally  distributed 
jitter  and  uniformly  distributed  jitter,  although  mild  condi¬ 
tions  apply  for  the  case  of  uniformly  distributed  jitter.  Sim¬ 
ulation  results  showed  that  both  detectors  maintain  the  set 
level  for  an  SNR  of  0  dB,  although  they  are  excessively 
conservative  for  small  sample  lengths.  The  detectors  also 
exhibit  good  performance  under  the  alternative  for  relatively 
small  sample  lengths. 
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ABSTRACT 

We  analyze  sonar  recordings  of  various  boats  as  well 
as  ambient  sea  noise  using  nonlinear  dynamical  sig¬ 
nal  models.  Specifically,  we  discuss  the  estimation  of 
the  parameters  of  nonlinear  delay  differential  equations 
from  data.  Using  the  model  parameters  as  classifi¬ 
cation  features  we  implement  a  three  class  Bayesian 
minimum-error-rate  classifier  and  demonstrate  almost 
perfect  classification  of  the  data  set  considered.  This 
indicates  that  classifiers  based  on  nonlinear  dynamical 
models  can  be  useful  in  sonar  applications. 

1.  INTRODUCTION 

An  important  task  in  underwater  passive  sonar  signal 
processing  is  determination  of  target  signatures  based 
on  the  narrow-band  signal  content  in  the  received  sig¬ 
nal.  However  identification  of  the  harmonics  and  their 
interrelations  for  such  sources  is  often  difficult  ,  partic¬ 
ularly  in  shallow  waters  where  multipath  propagation 
is  inevitable  and  the  channel  may  be  varying  consider¬ 
ably  with  target  distance.  In  this  paper  we  present  an 
alternative  approach  motivated  by  recent  advances  in 
nonlinear  dynamical  systems  theory. 

We  present  the  theory  and  application  of  a  recently 
developed  algorithm  for  passive  signature  characteri¬ 
zation  and  classification  using  nonlinear  signal  models 
[1].  This  algorithm  utilizes  a  robust  method  to  esti¬ 
mate  delay  differential  equation  (DDE)  models  from 
data.  The  model  parameters  are  estimated  using  gener¬ 
alized  higher-order  correlation  functions,  similar  to  the 
Yule- Walker  equations  in  parametric  signal  processing. 
This  method  involves  estimation  of  both  higher-order 
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statistical  moments  and  dynamical  moments.  The  dy¬ 
namical  moments  are  also  of  higher  order,  but  involve 
the  derivative  of  the  signal  [2]. 

Using  this  theory  we  outline  the  design  for  prac¬ 
tical  characterization  and  classification  algorithms  for 
applications  in  passive  sonar  signal  processing.  From 
the  model  coefficients,  which  reflect  low  dimensional 
dynamical  information  in  a  compact  way,  we  define  a 
feature  space.  These  features  can  easily  be  used  for 
classification  purposes.  The  subsequent,  partitioning  of 
the  feature  spare  can  be  done  by  employing  any  stan¬ 
dard  discrimination  method  such  as  Neyman-Person, 
Bayesian  or  neural  networks.  In  this  work  we  imple¬ 
ment  a  Bayesian  minimum-error-rate  classifier  [3]. 

Finally,  we  apply  these  ideas  to  the  analysis  of  real- 
world  passive  sonar  recordings  from  the  Baltic  Sea  off 
the  east  coast  of  Sweden,  in  shallow  water  of  an  ap¬ 
proximately  constant  depth  of  40  meters.  The  data  set 
consists  of  recordings  of  big  and  small  boats  as  well  as 
ambient  sea  noise. 


2.  ESTIMATION  OF  NONLINEAR 
DYNAMICAL  SIGNAL  MODELS 

Here  we  present  a  brief  description  of  our  model  esti¬ 
mation  procedure  using  time-domain  differential  equa¬ 
tion  signal  models.  For  a  more  detailed  description  the 
reader  is  referred  to  [1] [2].  First,  we  hypothesize  that 
that  we  observe  a  scalar  data  stream  x(t.)  generated  by 
some  measurement  of  some  accessible  observable  of  a 
physical  process.  We  hypothesize  that  the  process  evo¬ 
lution  itself  can  be  approximated  by  a  deterministic, 
relatively  low-dimensional  dynamics,  but  can  include 
purely  stochastic  elements  (i.e.  noise)  as  well.  We  will 
also  utilize  up  to  D  time-delayed  copies  of  x(t),  written 
x(t  -  dr)  with  1  <  d  <  D.  Hence  our  general  model 
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form  is 


x(t)  =  F[x(t),x(t  -  t),  -  Dt )].  (1) 

The  function  F  is  often  expanded  in  terms  of  some  basis 
functions.  Here  we  restrict  our  attention  to  a  two-delay 
second  order  model 


x  =  a\xTl  +  a,2XT2  +  asxTlxT2  (2) 

where  the  shorthand  notations  x  =  x(t)  and  xT  = 
x(t  -  t)  have  been  introduced.  The  unknown  model 
coefficients  ai,  a-?,  and  03  are  estimated  for  each  ob¬ 
servation  window,  and  will  comprise  our  feature  space. 
Below  we  present  an  estimation  method  that  is  numer¬ 
ically  robust  and  can  explicitly  preserve  some  of  the 
nonlinear  correlations  possibly  present  in  the  original 
time  series.  Briefly,  we  multiply  Eq.  (2)  by  each  basis 
term  xTl ,  xT2 ,  and  xTl  xT2 ,  and  average  over  an  obser¬ 
vation  window  of  length  T;  the  model  coefficients  are 
then  computed  by  solving  the  following  linear  equation: 


where 


R  = 


R  *  A  =  B 


(z2) 

(  ®Tl  ® T2  ) 


(xTlxT2)  {xlxT2) 


(X2) 


VTl  •‘'T2  , 

(*rxO 


V  (  XTl  %T2  )  (4,42)  / 


(3) 


(  ai  } 

1  ( xxTl )  ^ 

A  = 

a2 

B  = 

( xxT2 ) 

\  a*  y 

^  (xxTlxT2)  J 

Where  (*)  stands  for  the  expectation  value.  Note  that 
the  correlation  involving  the  signal  derivative  can  be 
calculated  from  the  derivative  of  the  correlation  func¬ 
tion,  i.e. 


(xxTl)  =  —(xxTl), 
and 

(xxTlxT2)  =  —(xxTlxT2)  +  ~{xxTlxT2). 

CtTi  CLT2 


(4) 


These  formulas  are  valid  in  the  long  window  limit  for  a 
bounded  stationary  signal  x(t).  The  main  practical  ad¬ 
vantage  of  using  Eq.  (3)  is  that  we  can  avoid  computing 
the  signal  derivatives,  which  is  the  main  difficulty  for 
noisy  signals.  The  expectation  values  on  the  left  hand 
side  of  Eq.  (3)  can  be  expressed  as  standard  higher- 
order  data  moment  functions  [4].  We  also  note  that  the 
dynamical  moments  involving  x  arise  exactly  because 
of  the  dynamical  representation  and  express  informa¬ 
tion  not  utilized  in  standard  higher  order  methods. 


3.  MINIMUM-ERROR-RATE 
CLASSIFICATION 

The  formalism  of  the  previous  section  allows  us  to  es¬ 
timate  the  parameters  of  a  given  dynamical  data  model 
using  the  standard  and  dynamical  correlation  functions. 
Our  main  aim  here  is  to  use  these  ideas  to  design  detec¬ 
tors  and  classifiers  using  standard  feature  discrimina¬ 
tion  methods.  First,  we  standardize  the  features  ai ,  a2 
and  as  by  requiring  that  all  signal  observation  windows 
are  normalized  to  zero  mean  and  unit  variance.  Thus 
a  particular  observation  window  can  be  represented  as 
a  point  in  the  three-dimensional  feature  space,  and  the 
set  of  all  the  observations  form  a  distribution  for  a  spe¬ 
cific  class  in  this  space. 

The  subsequent  partitioning  of  the  feature  space, 
i.e.  classifier  design,  can  be  done  by  employing  any 
standard  discrimination  method  such  as  Neyman-Person, 
Bayesian  or  neural  networks.  We  chose  a  Bayesian  ap¬ 
proach  to  build  a  minimum-error-rate  classifier.  Let 
ft  =  {uq,  ...,ws}  be  our  set  of  s  data  classes.  For  each 
class  u„  we  make  Nn  observations,  using  a  fixed  win¬ 
dow  length.  By  estimating  the  model  parameters 
a-2  and  a3  with  the  procedure  described  in  the  previous 
section,  we  obtain  for  each  class  a  set  of  model  co- 

A;  ”  \  where  the  vector  notation  A  = 

(01,02,03)  has  been  introduced.  There  are  many  ways 
to  represent  a  minimum-error-rate  classifier,  one  way  is 
in  terms  of  a  set  of  discriminant  functions  gt( A).  The 
classifier  is  said  to  assign  a  feature  vector  A  to  class  oq 
if 


gi(A)  >  gj(A)  for  all  j  /  i.  (5) 

A  Bayesian  minimum-error-rate  classifier  can  easily  be 
represented  in  this  way  [3].  We  can  simply  chose  g,  (A)  = 
P(uq|A)  so  that  the  maximum  discriminant  function 
corresponds  to  the  maximum  a  posteriori  probability. 
This  choice  of  discriminant  function  is  in  no  way  unique. 
We  can  replace  every  <7,(A)  with  /(<?;( A)),  where  /  is 
a  monotonically  increasing  function,  without  changing 
the  classification  ability.  A  particular  useful  form  is 

9i{A)  =  logp{A\ix>i)  +  log  P(u)i).  (6) 

Further,  if  we  assume  that  the  feature  distributions  are 
multivariate  normal  and  that  the  a  priori  probabilities 
of  all  classes  are  equal  Eq.  (6)  becomes 


ffi(A)  =  [A‘Sr1A-2AtSrVi  +  ^S-Vi] 


-\log\Hi\ 


(7) 
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Figure  1:  Typical  time  series  from  a  small  boat  (top), 
a  big  boat  (middle)  and  noise  (bottom). 

where  /Lt;  is  the  mean  and  £,  is  the  covariance  matrix 
of  class  i.  In  practical  cases  the  means  and  the  covari¬ 
ance  matrices  are  unknown  and  have  to  be  estimated 
from  a  training  set,  unless  the  analytic  signal  forms  are 
known.  While  the  distribution  of  the  DDE  model  coef¬ 
ficients  is  typically  not  Gaussian,  for  low-SNR  they  are 
nearly  so.  We  have  found  that  alternative  methods  that 
do  not  rely  on  the  normality  assumption,  e.g.  logistic 
discrimination,  do  not  show  significant  improvement. 

4.  DATA  ANALYSIS 

In  this  section  we  consider  the  analysis  of  a  sonar  data 
set  from  a  sea  trial,  conducted  by  the  Swedish  Defence 
Research  Agency.  Our  primary  objective  is  to  demon¬ 
strate  robust  classification  using  the  dynamical  classi¬ 
fication  method  described  in  the  previous  sections. 

4.1.  The  data  set 

Sonar  recordings  were  performed  in  the  Baltic  Sea  off 
the  east  coast  of  Sweden,  in  shallow  waters  of  an  ap¬ 
proximately  constant  depth  of  40  meters.  The  data 
set  consists  of  hydrophone  recordings  from  three  small 
boats  and  one  big  boat  passing  over  the  hydrophone  as 
well  as  ambient  sea  noise.  All  data  were  recorded  with 
the  sampling  rate  of  20  kHz.  During  the  recordings  the 
data  were  low-pass  filtered  0  to  6  kHz ,  hence  there  is 
no  problem  with  aliasing.  In  Fig.  1  typical  time  series 
from  a  small  boat,  big  boat  and  noise  are  displayed. 
Corresponding  power  spectra  are  displayed  in  Fig.  2. 
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Figure  2:  Typical  power  spectrum  from  a  small  boat 
(top),  a  big  boat  (middle)  and  noise  (bottom). 

4.2.  Classification  of  real-world  sonar  data 

We  now  describe  the  classification  of  the  real-world 
sonar  data  utilizing  the  two-delay  second  order  DDE 
model  given  by  Eq.  (2).  First  we  select  one  segment  of 
length  40  seconds  from  each  boat  recording  (the  clos¬ 
est  point  of  approach  is  included  in  the  segment)  and 
a  noise  segment  of  length  60  seconds.  The  data  seg¬ 
ments  are  then  windowed  into  1  second  (20000  sam¬ 
ples)  observation  windows,  with  a  0.5  second  (10000 
samples)  window  shift  to  provide  independent  samples. 
The  three  parameters  nq  ,  a2  and  03  are  estimated  with 
equation  Eq.  (3).  To  solve  Eq.  (3)  all  the  moments  in 
the  matrix  equation  have  to  be  estimated,  and  we  use 
an  unbiased  estimate  defined  as 

(x°(n)xb(n  —  i)xc(n  —  j))  = 

1  ^  ,  (8) 

— -  x°(n)xb(n  -  i)xc(n  -  j) 

N  —  m.  t—* 

7  n 

where  m  is  equal  to  the  largest  of  i  and  j;  N  is  the 
window  length  and  i  and  j  are  the  discrete  delays  cor¬ 
responding  to  t 1  and  t-2  respectively;  the  powers  a,  b 
and  c  are  set  to  0,  1  or  2  corresponding  to  the  moment 
that  has  to  be  calculate.  The  window  length  and  the 
delays  have  to  be  tuned  to  the  signal  of  interest.  We 
use  the  window  length  N  —  20000  samples  and  use  a 
subset  (12.5%)  of  the  small  boat  recordings  to  select 
the  two  delays  from  the  maximum  significance  of  L, 
where  L  is  given  by 

L  =  yja21{i,j)  +  al{i,j)  +  al(i,j)  (9) 

The  significance  of  L  is  displayed  in  Fig.  3.  From  the 
figure  we  can  identify  a  maxima  at  i  =  7  and  j  =  33 
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Figure  3:  Significance  of  L  estimated  from  a  subclass 
of  the  small  boats. 


Output  Decision  => 
True  ClassesJJ- 

Noise 

Small  boat 

Big  boat 

Noise 

100% 

0% 

0% 

Small  boat 

0% 

100% 

0% 

Big  boat 

0% 

1% 

99% 

Table  1:  The  confusion  matrix  shows  that  the  dynam¬ 
ical  classifier  provides  the  correct  class  decision  in  vir¬ 
tually  all  cases. 


samples.  The  feature  space  spanned  by  the  three  model 
parameters  is  shown  in  Fig.  4.  One  can  observe  clear 
separation  between  the  three  data  classes. 

Next,  we  implement  a  three  class  minimum-error- 
rate  classifier  following  the  outline  in  section  3.  The 
three  classes  are  small  boats,  big  boats  and  noise.  A 
randomly  selected  subset  (70%)  from  all  classes  were 
used  as  a  training  set  (i.e.  for  estimating  the  mean  and 
covariance  matrices).  The  remaining  30%  of  the  distri¬ 
butions  were  then  used  for  testing  of  the  classifier.  The 
training  and  testing  were  repeated  a  hundred  times  to 
remove  fluctuations  in  the  classification  performance. 
The  results  of  the  numerical  analysis  above  is  summa¬ 
rized  in  the  confusion  matrix  of  Table  1.  This  matrix 
consists  of  a  table  showing  the  true  class  of  the  input 
features,  versus  the  output  of  the  classification  algo¬ 
rithm  using  the  testing  set  of  features.  As  can  be  seen 
this  method  provides  almost  perfect  classification  of  the 
data  set. 


Figure  4:  Parameter  distributions  for  small  boats 
(dots),  big  boats  (circles)  and  noise  (stars). 

5.  CONCLUSIONS 

We  have  discussed  a  method  for  the  estimation  of  DDE 
signal  models  motivated  by  the  Yule- Walker  equations, 
which  provides  computational  speed,  numerical  stabil¬ 
ity  and  noise  robustness.  The  model  parameters  can 
be  used  to  represent  a  wide  range  of  signals,  further 
they  can  be  used  for  detection  and  classification  pur¬ 
poses.  In  this  paper  we  presented  a  classification  study 
of  real-world  sonar  data,  derived  from  a  sea  trial  in  the 
Baltic  Sea.  We  implemented  a  three  class  minimum- 
error-rate  classifier  based  on  a  two-delay  second  order 
DDE  model,  and  showed  almost  perfect  separation  be¬ 
tween  the  classes.  This  implies  that  classifiers  based 
on  nonlinear  dynamical  models  can  be  useful  in  sonar 
applications. 
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Abstract  A  problem  occuring  in  radio  astronomy  is  the  detection 
and  cancellation  of  spatially  correlated  interfering  signals  entering 
via  the  sidelobes  of  the  telescopes  in  an  array.  A  complicating  fac¬ 
tor  is  that  the  noise  powers  can  be  different  at  each  telescope.  For 
the  case  that  the  sensors  are  uncalibrated,  we  formulate  the  detec¬ 
tion  problem  as  a  test  on  the  covariance  structure,  state  the  GLRT 
for  this  problem,  and  relate  it  to  a  simpler  ad-hoc  detector.  We  de¬ 
rive  algorithms  to  estimate  the  noise  powers  and  the  subspace  of 
interferer  signature  vectors.  Once  the  subspace  is  estimated,  the 
interference  can  be  projected  out.  We  compare  this  method  to  the 
conventional  multichannel  subspace  detector  and  show  its  robust¬ 
ness  to  non-identical  channels  on  data  collected  with  the  Wester- 
bork  radio  telescope. 

1.  INTRODUCTION 

In  this  paper  we  study  the  detection  and  suppression  of  spatially 
correlated  signals  impinging  on  an  array  of  uncalibrated  non¬ 
identical  sensors,  in  the  presence  of  spatially  uncorrelated  noise. 
The  noise  covariance  matrix  is  diagonal  but  otherwise  unknown. 

The  motivation  for  this  study  comes  from  an  application  in  ra¬ 
dio  astronomy,  where  we  wish  to  detect  and  suppress  man-made  in¬ 
terfering  sources  impinging  on  an  array  of  telescopes.  The  output 
of  the  receiver  after  processing  is  essentially  a  sequence  of  short¬ 
term  (-10  second)  sample  correlation  matrices,  composed  of  the 
contributions  of  astronomical  sources  in  the  pointing  direction,  the 
additive  receiver  noise,  and  the  interference.  The  receiver  noise  is 
largely  independent  among  the  sensors,  but  the  receiver  gains  are 
not  identical,  with  differences  of  up  to  a  few  dB.  Until  now,  cali¬ 
bration  of  this  has  been  done  separately  and  taken  into  account  off¬ 
line.  An  interfering  source  is  usually  in  the  near  field  and  received 
through  the  side-lobes  of  the  parabolic  dishes,  hence  the  received 
signals  are  correlated  but  with  arbitrary  unknown  gains.  Our  aim 
is  to  detect  and  cancel  the  interference  online;  this  requires  online 
calibration  processing  as  well. 

Two  types  of  interference  play  a  role:  intermittent  signals  (e.g., 
TDMA  signals  as  in  the  GSM  system,  certain  radar  signals)  and 
continuously  present  signals  (e.g.,  television  signals,  GPS).  Our  ap¬ 
proach  for  intermittent  signals  is  to  detect  their  presence  on-line  on 
milli-second  periods,  and  discard  those  periods  which  are  deemed 
contaminated  (temporal  excision)  [1].  For  continuous  interference, 
we  also  wish  to  estimate  the  signature  (direction)  vector,  so  that 
we  can  project  out  that  dimension  from  the  data.  This  is  more  am¬ 
bitious,  and  also  requires  modifications  to  the  way  the  astronomi¬ 
cal  data  is  processed  after  recording  [2],  Note  that  the  astronomi¬ 
cal  signals  of  interest  are  much  weaker  than  the  receiver  noise  and 
hence  it  is  necessary  to  detect  interference  even  if  it  is  much  below 
the  noise  power.  The  astronomical  signals  themselves  are  too  weak 
to  be  detected  at  these  short  time  scales. 


When  the  interferers  are  weaker  than  the  system  noise  and  the 
receivers  are  non-identical,  the  change  in  eigenstructure  of  the  sam¬ 
ple  covariance  matrix  is  not  detectable  unless  one  of  two  steps  is 
taken.  The  first  is  pre -calibration  and  whitening.  The  second  which 
is  easier  to  implement  on-line  is  to  use  a  different  model  where  the 
noise  covariance  matrix  is  assumed  diagonal  but  not  necessarily 
equal  to  or  I,  and  to  detect  deviation  from  this  nominal  model.  This 
is  the  approach  taken  here.  The  Generalized  Likelihood  Ratio  Test 
(GLRT)  for  this  problem  turns  out  to  be  the  determinant  of  the  sam¬ 
ple  correlation  matrix,  a  fact  which  is  not  very  well  known  in  signal 
processing  but  has  been  used  for  a  long  time  in  certain  other  disci¬ 
plines. 

We  demonstrate  the  results  of  the  excision  using  the  GLRT 
detector  and  compare  it  to  a  detector  which  assumes  identical  re¬ 
ceivers.  We  also  demonstrate  the  improvement  in  the  estimate  of 
the  spatial  signature  as  compared  to  the  usual  eigendecomposition 
technique. 

2.  PROBLEM  FORMULATION 

Assume  that  we  have  a  set  of  q  narrow-band  Gaussian  signals  im¬ 
pinging  on  an  array  of  p  sensors.  The  received  signal  can  be  de¬ 
scribed  in  complex  envelope  form  by 

x(k)  =  £a,s,(fc)  +n(k)  =  Xs(k)+n(k)  (1) 

i=i 

where  x(k)  =  [.v,  (/:),•■■  ,xp{k)]T  is  a  px  1  vector  of  received  sig¬ 
nals  at  sample  times  k,  A  =  [aj ,  •  -  -  ,a?],  where  a,  is  the  array  re¬ 
sponse  vector  for  the  fth  signal.  s(k)  —  [,v  j  (A:) ,  —  ,sq(k)]T  is  aqx  1 
vector  of  gaussian  source  signals  at  sample  times  k  with  covariance 
matrix  Rs  =  £(ssH),  n (k)  is  the  px  1  additive  noise  vector,  which 
is  assumed  to  have  independent  gaussian  entries  with  unknown  di¬ 
agonal  covariance  matrix  R„  —  diagjvq , . . . ,  v;,}. 

We  would  like  to  detect  the  presence  of  signals  satisfying  the 
above  model,  i.e.,  given  data  vectors  x(  1 ) , . . . ,  x{N)  decide  whether 
q  =  0  or  q  >  0.  Secondly,  if  q  >  0,  we  would  like  to  detect  q 
and  estimate  the  interfering  subspace,  i.e.,  span(A),  so  that  we  can 
project  out  this  subspace  from  the  data.  We  do  not  assume  para¬ 
metric  knowledge  of  the  array  manifold  (since  the  interferers  enter 
in  the  side  lobes)  or  a  calibration  of  the  noise  power  in  each  chan¬ 
nel.  Under  these  assumptions  the  only  way  to  distinguish  between 
signal  and  noise  is  to  use  the  fact  that  the  noise  is  spatially  uncor¬ 
related,  hence  has  a  diagonal  covariance  matrix. 

The  detection  problem  is  thus  given  by  a  collection  of  hypothe¬ 
ses  ( CJV(0 ,  R)  denotes  the  zero-mean  complex  normal  distribution 
with  covariance  R ) 

Uq  :  x(k)  ~  CN{ 0,  R*)  m 

H' :  x(k)  ~  CAf(0,  R'),  9=1,2,-  w 
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where  Rg  is  the  covariance  matrix  of  the  model  with  q  interferes. 


4.  TEST  FOR  DIAGONALITY 


R?=AAH-fD,  where  A:  pxq,  Ddiagonal 

and  Ti1  corresponds  to  a  default  hypothesis  of  an  arbitrary  (unstruc¬ 
tured)  positive  definite  matrix  R'.  (Without  loss  of  generality,  we 
absorbed  the  interferer  covariance  matrix  Rt  in  A.) 

As  it  turns  out,  this  problem  has  been  studied  in  the  psycho¬ 
metrics,  biometrics  and  statistics  literature  since  the  1930s  under 
the  heading  of  factor  analysis  (but  usually  for  real-valued  matri¬ 
ces)  [3,4].  The  problem  has  received  much  less  attention  in  the 
signal  processing  literature.  Related  recent  work  includes  e.g.  di¬ 
rection  estimation  using  two  subarrays  with  mutually  uncorrelated 
noise  [5,6]. 


3.  THE  GLRT  DETECTOR 


Under  we  can  make  the  test  more  explicit.  To  estimate  R0  =  D, 
we  set  the  derivative  of  £  with  respect  to  the  parameters  of  D  to 
zero,  which  immediately  gives  D  =  diag(R).  Therefore  the  GLRT 
test  statistic  is  given  by 


L(X|7f0)  _  |R|"  , 

umi)  nf=,R?  1  1 


(4) 


where  C  is  the  sample  correlation  matrix  given  by  C  =  WRW  and 
W  =  diag{f-u !  ■  •  • ,  fpp}~1!2.  Note  that  0  <  |C|  <  1,  where  equality 
to  1  is  obtained  asymptotically  for  N  — »  °°  if  q  =  0.  Thus,  for  a 
certain  threshold  y  =  y( N)  between  0  and  1,  the  GLRT  is 


In  this  section  we  give  a  short  derivation  of  the  GLRT  for  the  de¬ 
tection  problem  Hq  versus  U' .  Note  that  both  hypotheses  are  com¬ 
posite  and  we  have  to  derive  maximum  likelihood  estimates  of  the 
parameters  for  each  of  the  hypotheses.  Under  %q ,  the  likelihood 
function  is  given  by 

L(x\nq)  -  l(x|r?)  = 

where  X  =  [x(  1), . . .  ,x(iV)]  and  R  =  ^  XMX(^')H  is  the  sam- 
ple  covariance  matrix,  |  •  |  denotes  the  determinant  and  tr(-)  the 
trace  operator. 

The  ML  estimate  of  R?  is  found  by  maximizing  L(X|R?)  over 
the  parameters  of  the  model  R?  =  AAH  4-  D,  or  equivalently  the 
log-likelihood  function 

£(X|R9)  -  A(-ln|R?|-tr(R-‘R))  . 

Denote  the  estimate  by  Rf/  =  AA"  +  D.  Under  II'  we  obtain  that 
the  ML  estimate  of  R'  is  given  by  R,  the  sample  covariance  matrix. 
The  log-likelihood  GLRT  test  statistic  is  thus  given  by 

ln  =  _iV(  tr(fe?16)  -  -  p  y • 

A  further  result  is  that  the  ML  estimate  of  R?  is  such  that  tr(R“'  R)  = 
p  so  that  we  can  base  the  test  on 

r,(X)  :=  Aln|R-‘R|.  (3) 

If  we  generalize  the  results  in  [3,4]  to  complex  data,  we  obtain 
the  following. 

Lemma  3.1  If  TLq  is  true  and  N  is  moderately  large  (say  N—q  > 
50),  then  2 Tq(X)  has  approximately  a  y2  distribution  with  v  —  (p- 
q)2-p  degrees  of  freedom. 

In  view  of  results  of  Box  and  Bartlett,  a  better Jit  is  obtained  by 
replacing  N  in  (3)  by  [3] 

N'=N-±(2P  +  5)-p. 

This  provides  a  threshold  for  a  test  of  Hq  versus  H'  corresponding 
to  a  desired  probability  of  false  alarm  PFA.  The  test  replaces  the 
more  familiar  eigenvalue  test  on  the  rank  of  R  in  the  case  of  white 
noise,  D  =  <rl.  Note  that  before  we  can  perform  the  test,  we  need 
to  compute  the  ML  estimates  of  A  :  pxq  and  D  (see  section  5). 


-  % 

Ti  -  |C|  £  Y  (5) 

Ht 

This  result  is  identical  to  that  in  the  real-valued  case  (see  [4, 
p.137]).  The  expression  is  rather  satisfactory  since  in  the  absence 
of  sensor  calibration  data  all  the  spatial  information  exists  in  the 
spatial  correlation  coefficients  between  the  different  sensors,  and 
the  GLRT  suggests  a  proper  way  of  combining  these  different  cor¬ 
relations.  It  is  also  quite  easy  to  implement  and  does  not  involve 
any  eigenstructure  computations.  From  lemma  3.1,  under  Hq  we 
know  that  -2Aln|C|  has  asymptotically  a  chi-square  distribution 
with  p2-p  degrees  of  freedom.  Again,  a  better  asymptotic  fit  is 
obtained  by  replacing  N  by  N'  =  N- g(2p+  1 J ). 

A  related  ad-hoc  detector  to  which  we  can  compare  is  based 
on  the  Frobenius-norm  of  the  off-diagonal  entries  of  C.  Since  the 
diagonal  entries  are  equal  to  1,  it  is  equivalent  to  take  the  norm  of 
C  itself,  i.e., 

Hi 

t2  =  ||C||f  §  V  (6) 

Ho 

In  fact,  it  is  straightforward  to  prove  that,  for  weak  signals,  the  per¬ 
formance  of  this  detector  must  be  approximately  equal  to  that  of  the 
GLRT.  Indeed,  for  weak  signals,  the  eigenvalues  of  C  are  equal  to 
=  1  +  £,-,  for  small  £;.  Note  that  tr(C)  —  p  =$  'Li^i  =  P  => 
£,•  =  0.  We  can  write 

Ti  =  n,A  = 

=>  In  (T, )  =  Yj  1"  h  =  I ,  £/  + 

=  -Yikej  +  Otf) 

whereas 

T2  =  II C  |||  =  EA?  -  L-l  +  2£,-  +  £? 

=»  ~h(ri-p)  = 

Since  a  monotonic  transformation  of  a  test  statistic  does  not  change 
the  outcome  of  the  test  if  the  threshold  is  modified  accordingly,1 
the  two  detectors  are  equivalent  up  to  third  order.  Computing  the 
Frobenius-norm  requires  only  0(p2)  operations,  versus  0(p3)  for 
the  determinant  test  (implemented  via  a  Cholesky  factorization  of 
C). 


1  Note  that  the  decisions  in  (5 )  and  (6)  are  opposite,  hence  the  change  of 
sign  in  the  second  transformation. 
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Short  term  time  frequency  spectrum  of  channel  1 
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Figure  1.  Time-frequency  spectrum  of  channel  1,  showing  GSM 
interference 

5.  PARAMETER  ESTIMATION 

To  enable  the  GLRT,  we  have  to  find  ML  estimates  of  the  factors 
A  :  pxq  and  D,  both  dependent  on  the  choice  of  q.  The  largest  per¬ 
missible  value  of  q  is  that  for  which  the  number  of  degrees  of  free¬ 
dom  v  =  (p~q)2~p  S  0,  or  q  <  p-^fp.  For  larger  q ,  there  is  no 
identifiability  of  A  and  D:  any  sample  covariance  matrix  R  can  be 
fitted.  Even  for  smaller  q ,  A  can  be  identified  only  up  to  a  q'/q  uni¬ 
tary  transformation  at  the  right,  i.e.,  we  can  identify  span(A).  This 
generalizes  the  white  noise  case  (where  span(A)  would  be  given 
by  the  eigenvectors  of  R),  and  is  sufficient  for  our  application  of 
interference  cancellation. 

For  q  >  0,  there  is  no  closed  form  solution  to  the  estimation  of 
the  factors  A  and  D  in  the  ML  estimation  of  R?  =  AA"  +  D.  There 
are  several  approaches  for  this: 

-  Suppose  that  the  optimal  ML-estimate  D  has  been  found.  We 
can  then  whiten  R  to  R  —  D-1/2RD-1/2,  and  similarly  the 
model,  giving  Rq  =  AAH  + 1.  Note  that  |R"'R|  =  |R^R|. 
which  is  the  usual  problem  for  white  noise,  solved  via  an 
eigenvalue  decomposition  of  R.  This  is  equivalent  to  solv¬ 
ing  min  ||R-(AAH  +I)|£.  Since  D  is  not  known,  this  leads  to 
an  iteration  where  A  is  plugged  back,  D  is  estimated,  etc. 

A  related  technique  is  alternating  least  squares,  where  we  al- 
tematingly  minimize  ||R-AA"  -|T)||^-  over  A  keeping  D  fixed, 
and  over  D  keeping  A  fixed.  (This  is  not  equivalent  to  the  de¬ 
terminant  cost  function  unless  a  weighting  by  D'1/2  is  used.) 
Both  iterative  techniques  tend  to  be  very  slow. 

-  Gauss-Newton  iterations  on  the  original  (determinant)  cost 
function,  or  on  the  (weighted)  least  squares  cost.  This  require 
an  accurate  starting  point. 

-  Ad-hoc  techniques  for  solving  the  least  squares  problem,  pos¬ 
sibly  followed  by  a  Gauss-Newton  iteration.  These  techniques 
try  to  modify  the  diagonal  of  R  such  that  the  modified  matrix 
is  low-rank  q ,  hence  can  be  factored  as  AAH.  For  this  we  can 
exploit  the  fact  that  submatrices  away  from  the  main  diagonal 
with  q  +  1  columns  have  rank  q.  See  [7]  for  an  example  with 
<7=1. 

More  details  on  estimation  algorithms  will  appear  in  an  extended 
version  of  this  paper. 

6.  APPLICATION  TO  RADIO  ASTRONOMY 

The  main  motivation  for  the  detection  and  subspace  estimation 
problem  stems  from  applications  to  interference  mitigation  in  radio 


Figure  2.  Computational  structure  of  the  blanking  process 

astronomy.  We  give  two  illustrations. 

We  first  apply  the  detector  for  Ho  to  sample  data  collected  with 
the  Westerbork  radio  telescope.  The  data  was  recorded  using  the 
8-channel  NOEMI  project  data  recorder  [1],  We  selected  a  band¬ 
width  of  2  MHz.  around  899  MHz,  with  a  duration  of  3  seconds. 
This  band  is  contaminated  with  various  GSM  mobile  telephony  sig¬ 
nals.  Such  signals  are  intermittent,  occupying  time  slots  of  length 
0.577  ms  in  frames  of  4.6  ms.  A  segment  of  the  data  is  shown  in 
figure  1 .  The  received  data  channels  were  split  into  subbands  of 
83  kHz  by  means  of  windowing  and  short-term  FFTs.  and  subse¬ 
quently  correlated  per  frequency  bin.  Each  covariance  matrix  is  an 
average  based  on  2 1  samples  and  covers  a  period  of  0.24  ms. 

Our  aim  is  to  test  for  the  presence  of  interference  in  each  co- 
variance  matrix.  Only  if  no  interference  is  detected,  the  block  is 
passed  to  a  long-term  correlator.  Two  detectors  have  been  applied. 
The  first  is  the  detector  of  (4),  and  the  other  one  is  given  by 


(7) 


This  detector  is  a  GLRT  assuming  identical  channels  (or  D  =  02I) 
[4]. 

Since  N  —  2 1  is  small,  we  have  not  used  the  theoretical  thresh¬ 
olds.  Instead,  we  have  excised  the  worst  10  percent  of  the  data  at 
each  frequency  channel  and  generated  spectral  estimates  by  further 
averaging  the  covariance  matrices  of  the  remaining  90  percent  of 
the  data.  The  processing  structure  is  shown  in  figure  2. 

Figure  3  shows  the  power  spectrum  of  channel  1  and  the  cross¬ 
spectrum  of  channels  1  and  3,  respectively,  before  and  after  blank¬ 
ing.  Without  excision,  we  can  see  that  several  interfering  signals 
are  present,  most  weak  but  one  rather  strong.  We  can  clearly  see 
that  while  both  detectors  excised  properly  the  strong  interference, 
the  detector  based  on  the  D  =  o2I  assumption  failed  to  excise  the 
weak  features  of  the  interference. 

In  a  second  application,  we  wish  to  spatially  filter  out  contin¬ 
uously  present  interference.  The  approach  is  to  estimate  span(A), 
and  to  apply  a  projector  PF  onto  the  orthogonal  complement  of  the 
span.  Here,  we  describe  only  a  limited-scope  simulation  on  syn¬ 
thetic  data,  where  we  estimate  a  rank-1  subspace  (t)  using  factor 
analysis,  and  for  comparison  (ii)  using  eigendecomposition  assum¬ 
ing  that  D  =  o2I,  or  (Hi)  using  eigendecomposition  after  whitening 
by  D-1/2,  assuming  the  true  D  is  known  from  calibration.  The  al¬ 
gorithm  used  for  factor  analysis  is  a  non-iterative  ad  hoc  technique 
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Power  spectrum  of  channel  1 
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Figure  3.  Power  spectra  and  cross-spectra  of  channels  1  and  3,  be¬ 
fore  and  after  interference  excision 

used  to  obtain  a  consistent  initial  estimate,  followed  by  a  Gauss- 
Newton  optimization  of  the  weighted  least  squares  cost  function 
(3  iterations).  The  weighting  is  by  D-1/2  as  obtained  from  the  ad 
hoc  technique.  We  have  generated  covariance  matrices  based  on 
the  model  (1)  with  q  —  1,  and  show  the  residual  interference  power 
after  projection,  i.e.,  ||P4-a||  as  a  function  of  number  of  samples  N, 
mean  noise  power,  and  deviation  in  noise  power.  The  noise  pow¬ 
ers  are  randomly  generated  at  the  beginning  of  the  simulation,  uni¬ 
formly  in  an  interval.  Legends  in  the  graphs  indicate  the  nominal 
noise  power  and  the  maximal  deviation.  All  simulations  use  p  —  8 
sensors  and  q—  1  interferer,  and  a  nominal  interference  to  noise 
ratio  per  channel  of  0  dB. 

The  results  are  shown  in  figure  4.  The  first  graph  shows  the 
residual  interference  power  for  varying  maximal  deviations,  the 
second  graph  shows  the  residual  for  varying  number  of  samples  N, 
and  a  maximal  deviation  of  3  dB  of  the  noise  powers.  The  figures 
indicate  that  already  for  small  deviations  of  the  noise  powers  it  is 
essential  to  take  this  into  account.  Furthermore,  the  estimates  from 
the  factor  analysis  are  nearly  as  good  as  can  be  obtained  via  whiten¬ 
ing  with  known  noise  powers. 
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Abstract — In  this  paper,  an  RLS  algorithm  with  selective  mod¬ 
ification  of  the  system  estimation  covariance  matrix  is  employed 
to  track  the  rapidly  changing  components  of  system  parame¬ 
ters.  A  new  on-line  wavelet  detector  is  designed  for  accurately 
identifying  the  changing  locations  and  the  branches  of  changing 
parameters.  Employing  theses  techniques,  the  tracking  perfor¬ 
mance  of  the  proposed  algorithm  to  rapidly  changing  systems 
can  be  significantly  improved. 

Keywords —  RLS  algorithm,  dyadic  wavelet  transform,  recur¬ 
sive  wavelet  change  detector,  covariance  matrix  modification. 

I.  Introduction 

When  a  time-varying  system  is  subject  to  rare  but  abrupt 
(jumping)  changes,  the  estimated  parameters  by  conventional 
adaptive  algorithms  cannot  track  the  variations  of  true  system 
parameters  in  the  vicinity  of  these  jumping  locations,  resulting 
in  the  so  called  ‘lag’  estimation.  Three  methods  can  be  used 
to  mitigate  the  effect  of  ‘lag’  estimation.  One  is  to  use  variable 
forgetting  factor  RLS  algorithms  [1],  The  second  is  to  increase 
the  system  estimation  covariance  matrix  at,  the  jumping  loca¬ 
tions  [2],  [3].  The  third  includes  various  Bayesian  Kalman  fil¬ 
tering  algorithms  [4],  [5].  In  this  paper,  the  second  method  will 
be  adopted  to  track  the  abrupt  changes  of  system  parameters. 
One  of  the  difficulty  of  this  method  is  how  to  identify  on-line 
the  locations  of  the  abrupt  changes  unknown  to  users.  Some 
approaches  have  been  developed  towards  this  task  [6]- [8].  How¬ 
ever,  the  obvious  trade-ofT  between  detection  sensitivity  and 
robustness  exists  in  these  methods.  The  design  of  a  simple 
but  efficient  detection  and  modification  algorithm  need  further 
investigation. 

To  identify  the  rapidly  changing  points  effectively,  a  new  on¬ 
line  detection  algorithm  based  on  a  multiscale  product  sequence 
in  wavelet  domain  is  proposed  in  this  paper.  The  proposed 
wavelet  detector  can  efficiently  suppress  background  noise  and 
enhance  the  abruptly  changing  components  so  that  it  is  very 
robust  to  interferences  and  sensitive  to  jumping  changes  com¬ 
pared  with  the  conventional  detectors.  A  new  algorithm  for 
selectively  modifying  the  elements  of  the  covariance  matrix  is 
proposed. 

II.  Recursive  Adaptive  Algorithm  for  Rapidly 
Changing  (Jumping)  Systems 

A.  Tracking  by  changing  point  detection 

A  time-varying  system  commonly  can  be  represented  by  a 
linear  regression  equation  and  the  changes  of  system  parameters 
can  be  modeled  with  a  order  one  (first  order)  random  walk 
model  [2], 

6t+  i  =  8t  +  n't ,  (1) 

1 It  =  ipJOt+et.  (2) 

Where,  6t  is  the  true  system  parameter  vector  of  size  IVx  1, 
yt  is  the  scalar  observation  (output)  signal,  <pt  is  the  system 
(input)  regressor  vector  of  size  N  x  1,  Wt  is  the  system  noise 
vector  of  size  N  x  1  and  e;  is  the  measurement  noise  signal. 
When  the  variations  of  system  parameters  are  slow  enough,  an 


RLS  algorithm  can  be  used  to  track  the  time-varying  system 

[2], 


0, 

—  9t-i  +  Gt(yi  -  tpj0i-i), 

(3) 

G, 

p.  .  P<~'Tt 

(4) 

A t  +  -Pr-i pt 

P, 

1  p, -tip, tpj  Pi-i  ^ 

—  .  1 1 t — 1  ,  <7  •  / ; 

(5) 

A;  Af  +  Pt  Pt-lpt 

yt  - 

pTOt-1  is  the  a  priori,  prediction  error, 

Gt  is 

the  filtering  gain,  Pt  is  the  estimation  covariance  matrix,  and 
A,  is  the  forgetting  factor. 

When  the  RLS  algorithm  ((3)-(5))  is  used  for  tracking  a 
rapidly  changing  system,  the  estimation  covariance  matrix  Pi 
or  Pi- 1  can  be  increased  at  the  locations  of  jumping  points  so 
that  the  filtering  gain  can  be  increased  significantly  to  track 
the  rapidly  changing  components  [2].  When  using  this  meth¬ 
ods,  the  jumping  points  are  needed  to  be  known  a  priori  and 
commonly  this  is  unrealistic  in  practice.  Therefore,  an  recur¬ 
sive  parameter  change  detection  algorithm  is  required  to  on-line 
identify  the  locations  of  jumping  points.  Some  recursive  change 
detection  algorithms  have  been  developed  in  [6]-[8].  An  attrac¬ 
tive  met  hod  among  them  is  the  one  used  by  Trigg  and  Leach  (T 
&  L)  [6].  In  this  method,  two  filtering  of  the  prediction  error 
signal  £t  are  used 

e°t  =  (1-7K-!+7£i,  (6) 

e?  =  (l-7K-,+7N,  (7) 

where  |.|  denotes  absolute  value  and  7  takes  a  small  value  of 
greater  but  very  close  to  0  (commonly  0.005  <  7  <  0.05).  The 
T  L  detection  signal  is  defined  as  [6] 


According  to  the  cental  limiting  theorem,  d,  is  asymptotically 
Gaussian  distributed.  It  is  shown  in  [6]  and  [7]  that,  for  small 
7,  d,  is  a  zero  mean  signal  with  variance  approximately  as 

Var(di)  =  E(dl)  «  (9) 

The  detection  signal  d(  will  hence  fluctuate  arottnd  zero  when  no 
change  has  occurred.  If  the  true  parameters  change,  successive 
prediction  errors  are  likely  to  have  same  sign  and  hence  |d,| 
will  increase.  Assume  a  detection  threshold  is  r  (which  can  be 
evaluated  using  Chebyshev’s  inequality  and  (9),  see[7]).  When 
Id,  |  >  r  at  time  index  t,  it  is  considered  a  parameter  change  has 
happened,  and  Pt  or  Pf_  1  will  increase  a  value  by  A,  at  this 
time  index  [2]. 

The  merit  of  this  detector  is  that  it  is  computationally  simple, 
recursive,  and  the  variance  of  detection  signal  is  not  relevant  to 
the  prediction  error  signal.  However, there  exists  a  trade-off  be¬ 
tween  the  false  alarm  probability  and  the  detection  probability 
of  the  T  &  L  detector.  If  we  want  to  decrease  the  false  alarm 
probability,  we  must  increase  the  detection  threshold.  This  will 
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increase  the  miss  alarm  probability  and  thus  decease  the  detec¬ 
tion  probability.  In  the  following  subsection,  we  will  develop  a 
wavelet  domain  change  detection  algorithm  which  can  achieve 
much  higher  detection  probability. 

B.  Recursive  DWT  algorithm  and  multiscale  product  sequence 

The  fast  algorithm  of  DWT  has  been  implemented  via  dis¬ 
crete  digital  filters  in  [11],  [12].  However,  it  is  an  iterative  pro¬ 
cedure  which  only  can  process  a  batch  of  signals.  For  detecting 
changing  points  on-line,  a  recursive  DWT  algorithm  should  be 
developed  which  can  be  summarized  as  the  following  theorem. 

Theorem  1:  A  causal  DWT  coefficients  zt(j)  of  ft  at  time  t 


and  scale  j  can  be  calculated  as 

Hi)  =  W2jft 

2- * 

=  (10) 

k= 0 

where  hek(j),  k  —  0,  ■  •  ■ ,  2J  is  an  equivalent  causal  filter  for  scale 
j.  Further,  3t(j)  can  be  recursively  calculated  as 

Q{%-)  =  f2(t-1>(j)  +  ftHe(j),  (11) 

Hi)  =  (12) 

Here  fl^(j),  D^-1)(j),  and  He(j)  are  column  vectors  of  size 
2J  x  1  which  are  defined  as: 

«(t)o-)  =  [z((‘)(i),-,z«2i_1o-)r,  (i3) 

^t-1)(i)  =  (14) 

He(j)  =  [heo(j),---,he2j_1(j)}',  (15) 


and  £<%•),  i  <  k  <  t+2j  —  2  denote  the  causal  DWT  sequence 
of  Yf  =  [f i,---,  ft]  using  filter  h%(j). 

The  proof  of  the  above  theorem  is  omitted  here  to  save  space. 
See  [14]  for  details.  It  shows  that  zt(j )  at  time  t  can  be  recur¬ 
sively  calculated  on-line  using  (11)  and  (12)  once  a  new  data 
sample  ft  arrives.  In  table  1,  we  have  listed  the  filter  coefficients 
of  h%(j)  for  scale  j  —  1  to  4. 

Multiscale  product  of  the  first  K  scale  sequences  in  wavelet- 
domain  at  time  index  t  is  defined  as 


by  the  original  detection  signal.  Motivated  by  the  above  discus¬ 
sion,  a  new  wavelet  jump  detector  is  proposed  in  the  following 
subsection  for  on-line  change  detection. 

C.  Wavelet  jump  detector  for  on-line  abrupt  change  detection 

Denote  zt(j)  as  the  causal  DWT  of  the  T  &  L  detection  signal 
dt  (8)  at  time  t  and  scale  j.  That  is, 

zt{j)  =  W2j  dt ,  (If) 

which  can  be  recursively  calculated  as  (11)  and  (12).  The  multi¬ 
scale  product  signal  of  the  first  K  scales  thus  can  be  calculated 
as 

K 

if = n*w- 

3=  1 

Define  a  new  (multiscale  product)  detection  signal  (t  by  filtering 
H  as  follows 

Ct  =  (1  —  v)Ct-i  +  i?lf ,  (19) 

where,  r)  is  an  exponential  smoothing  factor  which  commonly 
takes  a  value  in  0.05  <  r]  <  0.13.  Although  is  heavy-tailed 
non-Gaussian  distributed,  obtained  above  is  a  Gaussian  dis¬ 
tributed  signal  according  to  the  central  limiting  theorem.  Now 
a  new  wavelet  detector  can  be  formed  as 


dt 


Var  (dt) 

Var(Ct) 


it, 


(20) 


Obviously,  if  dt  is  a  Gaussian  distributed  signal,  dt  is  also  a 
Gaussian  distributed  signal  whose  variance  is  the  same  as  the 
one  of  dt.  However,  if  dt  has  some  local  maxima  (minima)  which 
correspond  to  the  abrupt  changes  of  the  original  signal,  these 
local  maxima  (minima)  will  be  enlarged  and  sharpened  in  dt. 
This  characteristics  undoubtedly  can  be  employed  to  provide  a 
more  robust  and  accurate  identification  of  the  possible  abrupt 
changes.  That  means  if  we  choose  the  detection  threshold  f  of 
the  wavelet  detection  signal  dt  equal  to  r  of  the  T  &  L  detection 
signal  dt,  we  can  achieve  much  higher  detection  probability. 

To  get  the  new  wavelet  detector  dt,  the  variance  of  £t 
(Var (C<))  should  be  estimated  in  (20).  One  method  is  as  follows. 
Var(C0  can  be  recursively  estimated  as 


K 

it  =  n*w-  (ig) 

3  =  1 

The  wavelet  used  for  DWT  in  this  paper  is  chosen  as  the  first 
order  derivative  of  a  smooth  function  (a  cubic  spline  function, 
see  [11]).  The  multiscale  product  sequence  £f  sharpens  and  en¬ 
hances  the  modulus  maxima  which  are  dominated  by  the  signal 
edges  and  at  the  same  time  suppresses  the  modulus  maxima 
which  are  dominated  by  noises.  It  has  been  further  shown  that 
the  probability  density  function  (PDF)  of  a  multiscale  prod¬ 
uct  sequence  is  heavy  tailed  compared  with  that  of  a  Gaussian 
distributed  one  with  the  same  variance  [13].  Employing  these 
characteristics,  a  DWT  multiscale  product  sequence  of  an  exist¬ 
ing  detection  signal  (for  example,  obtained  from  a  T  &  L  detec¬ 
tor)  can  be  used  as  a  new  detection  signal.  It  will  enhance  the 
components  representing  possible  abrupt  changes  in  the  origi¬ 
nal  detection  signal  and  thus  a  larger  detection  threshold  can 
be  used,  which  will  lead  to  a  smaller  false  alarm  probability. 
At  the  same  time,  it  will  suppress  the  noise  interference  com¬ 
ponents  in  the  original  detection  signal,  which  will  decrease  the 
miss  alarm  probability  and  thus  increase  the  detection  proba¬ 
bility  when  using  the  same  detection  threshold  as  the  one  used 


—  (1  —  pK’t-i  +  pit  i  (21) 


and  the  variance  of  dt  is  asymptotically  as  (9),  thus  dt  can  be 
estimated  as  dt, 


dt  = 


*  t  it 
2  2  —  7  vt  ’ 


(22) 


where,  p  is  another  exponential  smoothing  factor  which  takes  a 
value  of  greater  but  very  close  to  0  (commonly  0.01  <  p  <  0.03). 
For  estimating  the  variance  of  it,  (22)  is  asymptotically  efficient 
but  very  sensitive  to  the  choice  of  p.  Moreover,  the  variance 
estimation  by  (22)  is  heavily  affected  by  the  location  density  of 
jumping  points.  To  overcome  these  problems,  a  better  method 
is  proposed  below  based  on  an  empirical  equation  to  estimate 
the  variance  of  it. 

Assuming  the  ratio  of  the  variance  of  new  wavelet  multiscale 
product  detection  signal  to  the  variance  of  the  T  &  L  detection 
signal  (R-W-TL)  as 


R-W-TL  = 


Var(Ct) 

Var(dt) 


=  exp 


m 

+  KTJ  +  T  -  ^ 


;=o 

(23) 
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and  the  wavelet,  detector  (20)  using  empirical  variance  ratio 
estimation  (23)  can  be  represented  as  dt, 


d, 


0 

\/R-W-TL  ’ 


(24) 


where  the  values  of  k,  t,  and  can  bo  estimated  by 

applying  least-squares  method  to  experimental  data  through 
Monte  Carlo  simulations.  A  recommendation  values  for  k, 
t  are  n  =  —5.1043,  r  =  1.0026  and  an  order  m  =  9 
polynomial  coefficients  are  v  =  {1.272  x  1011,— 5.9549  x 
10lb,  1.179  x  1010, -1.2877  x  10n,  8.4984  x  107, -3.5005  x 
106, 9.0861xl0'\  -1.5622  xlO3, 2.5120x10’}.  Extensive  simu¬ 
lations  have  verified  that  the  empirical  equations  (23)  and  (24) 
are  effective  and  produce  quite  accurate  results.  See  [14]  for 
details. 


I).  Selectively  tracking  of  rapidly  changing  systems  using 
wavelet  jump  detectors 

Normally,  different  branches  of  system  parameters  are  not,  al¬ 
ways  subject  to  abrupt  changes  at  the  same  time  when  a  jump 
occurs  in  a  time-varying  system.  When  we  modify  the  matrix 
P(-i  or  Pi  (in  (4)  or  (5))  with  A,,  it  is  common  to  select.  A, 
as  a  diagonal  matrix  where  each  diagonal  element  reflects  the 
change  of  the  corresponding  parameter  branch.  When  one  or 
several  branches  have  changed  rapidly  at  a  specific  time  t,  the 
corresponding  elements  in  A/  should  be  increased  while  the  re¬ 
maining  elements  should  keep  unchanged  [2],  This  requires  that 
the  jump  detector  can  not  only  identify  the  locations  where  the 
jumps  have  happened  but  also  determine  the  branches  produc¬ 
ing  these  jumps.  A  priori  prediction  error  signal  is  used  to 
construct  the  jump  detector  [6],  [7]  which  (named  as  prediction 
detector )  only  can  determine  where  a  jump  happens  for  a  time- 
varying  system.  To  judge  which  branches  this  jump  is  produced 
by,  a  set  of  jump  detectors  can  be  constructed  directly  from  the 
estimated  filtering  gains  (named  as  gain  detectors).  Combin¬ 
ing  the  prediction  detector  with  gain  detectors,  a  new  selective 
wavelet  detector  is  proposed  in  the  following,  which  can  de¬ 
termine  not  only  the  locations  of  jumping  points  but  also  the 
branches  that  have  produced  the  jumps. 

Assume  a  wavelet  detector  df  ( prediction  detector)  is  ob¬ 
tained  from  the  priori  prediction  error  signal  £t  =  yi  - 
(fifdt-i  (3).  Assume  other  N  wavelet  detectors  d) . ■ ■  ■ .  df 
[gain  detectors)  are  obtained  from  the  estimated  filtering  gains 
Gf(l),  •  •  • ,  Gt(N)  (4)  respectively.  Without  loss  of  generality, 
here  we  assume  a  system  jumping  change  at  a  specific  time  is 
produced  by  an  abrupt,  change  of  only  one  parameter  branch 
(The  case  of  several  parameter  branches  changing  at  the  same 
time  is  a  simple  extension).  The  proposed  selective  wavelet, 
detector  uses  both  the  prediction  detector  and  the  gain  detec¬ 
tors  for  parameter  change  detection.  More  explicitly,  an  abrupt 
change  is  considered  to  be  detected  at  the  ith  (i  =  1,  -  -  - ,  TV) 
parameter  branch  at  time  t,  if 

|d{|>r  and  |J][  >  f,  (25) 

where  the  detection  threshold  f  can  be  determined  from  (9). 

In  a  summary,  we  list  the  complete  RLS  algorithm  using 
estimation  covariance  matrix  modification  and  selective  wavelet 
detector  (abbreviated  as  RLS-MSWD)  at  time  t  as  follows: 

•  (a).  RLS  algorithm 

Using  (3)-(5)  to  calculate  0t,  et,  Gt  and  Pt; 

•  (b).  Selective  wavelet  detector  for  change  detection 

-  (bl).  From  et,  calculating  (6)-(8),  (17)  (implemented  with 
(11)  and  (12)),  (18),  (23),  and  (24)  to  get  the  predictive 
detector  d\ , 


-  (b2).  For  i=l:Ar  {Using  Gt  (i)  instead  of  st  in  (6)  and  (7), 
calculating  equations  as  in  (bl)  to  get  the  ith  gain  detector 
d\ }  End 

-  (b3).  Using  (25)  to  detect  if  a  jumping  change  has  happened. 
If  yes.  determine  which  parameter  branch  produces  this  change 
and  got  to  (c)  and  set  At,  otherwise  t  —  f  +  1  and  go  to  (a); 

•  (c).  Estimation  covariance  matrix  modification 
Modify  Pt-i  or  Pt  in  (4)  or  (5).  t  =  t  +  1  and  go  to  (a). 


fc)f  0 


m  ran  m  i6co  ran 
- , - 1 - 


_l _ I _ I _ I - L_ 


(c) 


(d)|0.5 


(e)  Jos 


tot, 

m  €0  etc 

ac  iodd  m  i4i  leoo  ia 

!  i  i 

2(1  40D  m 

800  1CC0  1200  14DCI  1600  18 

Jv.  1 _ .  1 _ lu  L. _ xJ  >  A 

2CC  4C0  ecu 

at!  1000  1200  1400  1000  18 

Tim 


Fig.  1.  Comparison  of  the  wavelet  detector  with  the  T  &  L  detector. 
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Fig.  2.  An  ARX(2,1)  abruptly  changing  system  identification  by  the 
proposed  RLS-MSWD  algorithm, (a)&t(l),(b)fct(2),(c)ct(l). 


III.  Simulation  Results 

In  figure  1,  a  wavelet  detector  is  compared  with  a  T  &  L 
detector,  (a)  shows  a  stationary  white  Gaussian  noisy  signal 
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by  the  RLS-MTLD  method  in  figure  3  are  disturbed  and  af¬ 
fected  by  each  other. 


0  100  200  300  4D0  5D0  600  7D0  8C0  900 

Tin* 

Fig.  3.  An  ARX(2,1)  abruptly  changing  system  identification  by  the 
RLS-MTLD  algorithm ,  (a)  6*  ( 1 ) ,  (b)bt  (2) , (c) Ct  ( 1 ) . 


which  has  three  abrupt  changes  at  the  vicinity  of  time  locations 
100,  700,  and  1500  respectively.  The  amplitudes  and  shapes  of 
three  abrupt  changes  are  shown  in  (b).  In  (c),  the  solid  line  rep¬ 
resents  the  T  &  L  detection  signal  and  the  dotted  line  represents 
the  wavelet  detection  signal  (24)  obtained  using  the  theoreti¬ 
cal  R-W-TL  (23).  (d)  shows  the  same  trace  as  the  one  repre¬ 
sented  by  the  dotted  line  in  (c),  i.e.,  wavelet  detection  signal 
obtained  using  the  theoretical  R-W-TL  (wavelet  decomposition 
scale  number  K  =  3).  (e)  shows  the  wavelet  detection  sig¬ 
nal  (22)  obtained  using  the  recursive  variance  estimation  (21). 
Comparing  the  wavelet  detection  signal  with  the  T  &  L  de¬ 
tection  signal  in  (c),  the  former  can  provide  sharper  and  more 
accurate  indication  of  the  abrupt  changing  points  and  this  is 
very  important  for  detecting  small  amplitude  or/and  concen¬ 
trated  abrupt  changes.  Comparing  (d)  with  (e),  it  can  be  seen 
that  the  detector  in  (e)  is  asymptotically  consistent  with  the 
one  in  (d)  when  the  recursive  estimation  of  variance  becomes 
more  and  more  accurate. 

An  ARX(2,1)  system 

l It  =  bt(l)yt-i  +  bt(2)yt-2  +  ct(l)ut-i  +  ut, 

is  used  for  verifying  the  performance  of  the  proposed  abrupt 
change  tracking  algorithm.  Here  the  system  parameters  bt(  1) 
and  bt( 2)  are  both  with  abrupt  changes  and  c4(l)  is  constant 
shown  in  figure  2.  The  identification  results  by  the  proposed 
RLS-MSWD  are  shown  in  figure  2,  where  7  =  0.02,  rj  =  0.10, 
K  =  3,  and  the  empirical  formulas  (23)  and  (24)  are  used  for 
producing  the  wavelet  detectors.  (Solid  lines  represent  tracking 
results  and  dotted  lines  represent  true  values.)  It  can  be  seen 
that  the  estimation  coincides  with  true  parameter  values  very 
well.  For  comparison,  identification  results  by  the  RLS  algo¬ 
rithm  using  T  &  L  detector  (abbreviated  as  RLS-MTLD)  are 
shown  in  figure  3.  Since  the  T  &  L  detector  is  not  so  sen¬ 
sitive  to  the  abrupt  changes  as  the  selective  wavelet  detector, 
the  identification  results  by  the  RLS-MTLD  method  can  not 
track  abrupt  changes  with  small  amplitude  (see  bt( 2)  between 
time  index  330  and  500)  and  concentrated  abrupt  changes  (see 
6t(l)  between  time  index  570  and  620)  in  figure  3.  Moreover, 
from  figure  2  we  can  see  that  the  proposed  RLS-MSWD  method 
can  selectively  track  the  abrupt  changes  of  different  parameter 
branches;  while  the  estimation  of  different  parameter  branches 


IV.  Conclusions 

In  this  paper,  the  problem  of  tracking  abruptly  changing  sys¬ 
tems  has  been  tackled.  A  new  on-line  wavelet  detector  has  been 
proposed  which  is  computationally  simpler  and  can  achieve 
much  higher  detection  probability  than  commonly  used  abrupt 
detection  methods.  Selectively  tracking  the  rapidly  changing 
parameter  branches  via  estimation  covariance  modification  at 
the  jumping  points  has  been  rigorously  discussed. 
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Appendix 

Tabel  1:  Filter  coefficients  used  for  causal  DWT  (1  <  j  <  4) 


hek{  1)  =  {-1.3333,1.3333}; 

h%{ 2)  =  {-4.4643  x  10~\ -2.2321  x  HT1, 4.4643  x  10~\ 

2.2321  x  10~ 1 }; 

hek(3)  =  {-3.0941  x  10~2,  -5.0278  x  HT2,  -3.8676  x  10-2, 

-2.7073  x  10-2, 1.2136  x  10_1, 9.1019  x  10~2, 
6.0680  x  10-2, 3.0340  x  10~2}; 

h%{ 4)  =  {-1.2136  x  10“\ -1.5170  x  HT1,  -6.0680  x  10-2, 

3.0340  x  10-2, -1.5470  x  10~2, -3.8676  x  10~3, 
7.7351  x  10-3, 1.9338  x  10-2, 3.0941  x  10~2, 

2.7073  x  10“2, 2.3205  x  10“2, 1.9338  x  10~2, 

1.5470  x  HT2, 1.1603  x  10~2, 7.7351  x  10“3, 

3.8676  x  10“3}. 
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ABSTRACT 

This  paper  introduces  a  novel  semiblind  approach  to  space-time 
linear  detection  in  multiple-access  systems.  A  new  criterion  for 
the  selection  of  the  linear  receiver  coefficients,  based  on  the 
Maximum  Likelihood  (ML)  principle,  is  derived  and  a  practical 
implementation  by  means  of  a  fast  Expectation-Maximization 
(EM)  algorithm  is  suggested.  The  semiblind  criterion  is  obtained 
from  a  purely  statistical  point  of  view  where  the  aim  of 
training  data  is  not  to  enhance  performance  but  to  eliminate 
misconvergence  problems. 

1.  INTRODUCTION 

It  has  been  recently  shown  that  deploying  multiple  transmitting 
and  receiving  antennae  can  substatially  improve  the  capacity  of 
multipath  wireless  channels  if  the  rich  time-scattering  propagation 
is  properly  exploited.  Space-Time  Coding  (STC)  is  a  novel 
proposal  that  combines  channel  coding  techniques  suitable  for 
multiple  transmitting  elements  with  signal  processing  algorithms 
that  exploit  the  spatial  and  temporal  diversity  at  the  receiver  [1,2]. 

In  the  paper,  we  focus  on  the  signal  processing  issue  of  soft 
detection  as  a  prior  step  to  channel  decoding.  We  introduce  a 
novel  semiblind  criterion  based  on  the  Maximum  Likelihood  (ML) 
principle  that  inherently  exploits  any  existing  spatio-temporal 
structure  induced  by  the  space-time  encoder  in  order  to  linearly 
estimate  the  transmitted  symbols.  Note  that  estimating  the  stream 
of  symbols  transmitted  from  the  j-th  antenna  involves  removing 
the  Inter-Symbol  Interference  (ISI),  due  to  the  channel  time- 
scattering,  and  the  Multiple  Access  Interference  (MAI),  due  to 
the  other  symbol  streams.  The  method  is  termed  semiblind 
because  not  only  exploits  the  a  priori  knowledge  of  the  part  of 
the  transmitted  symbols  but  also  the  information  bearing  symbols 
transmitted  from  a  single  antenna.  A  remarkable  feature  of  the  new 
criterion,  when  compared  to  other  semiblind  approaches  which 
are  basically  ad  hoc  or  heuristic  methods  [3],  is  that  it  is  derived 
from  a  purely  statistical  point  of  view.  A  fast  iterative  algorithm, 
derived  from  the  Expectation-Maximization  (EM)  [4]  framework, 
is  suggested  as  a  means  of  practical  implementation.  In  this 
algorithm,  symbols  a  priori  known  are  extremely  useful  to  avoid 
the  misconvergence  problems  so  typical  in  ML  methods. 

The  remaining  of  the  paper  is  organized  as  follows.  Section 
2  describes  the  system  model.  The  novel  ML-based  criterion  is 
presented  in  section  3.  The  iterative  EM  algorithm  is  derived 
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in  section  4  and  its  performance  is  evaluated,  through  computer 
simulations,  in  section  5.  Finally,  section  6  is  devoted  to  the 
conclusions. 

2.  SYSTEM  AND  SIGNAL  MODEL 

Figure  )  shows  the  basic  building  blocks  of  a  wireless 
communication  system  with  Space-Time  (ST)  coding  capabilities 
[5],  The  bit  stream  to  be  transmitted,  {fc(/)}i=o,i,---»  is  fed  >nt0 
a  temporal  coding  stage  followed  by  a  Serial  to  Parallel  (S/P) 
converter  that  creates  some  desired  spatio-temporal  structure.  A 
bank  of  N  identical  Waveform  Encoders  (WE)  and  transmitting 
antennae  yields  the  information  bearing  signals  to  be  transmitted, 
-  •  •  ,SN(t).  Transmission  is  carried  out  in  bursts  of 
NK  log,,  .4  bits,  i.e..  K  complex  symbols  per  antenna  with  log-2  A 
bits  per  symbol.  Multipath  propagation  between  each  transmitting 
and  receiving  antennae  results  in  a  Multiple  Input  Multiple  Output 
(MIMO)  channel  with  Inter-Symbol  Interference  (ISI).  A  bank 
of  L  >  N  matched  filters,  sampled  at  the  symbol  rate,  j,,  is 
employed  at  the  receiver  to  obtain  a  set  of  sufficient  statistics, 
x i  (n),  ■  •  •  ,x/.(n),  n  =  0,  •  •  • ,  K  -  1.  An  adequately  chosen 
linear  processor,  consisting  of  a  bank  of  linear  Finite  Impulse 
Response  (FIR)  filters,  provides  soft  estimates,  y\ (tt),  •  ■  ■ ,  t/w  (n), 
n  =  0,  •  •  • ,  K  -  1,  of  the  complex  transmitted  symbols,  that  we 
denote  as  .si  (n),  •  •  • ,  s^(n),  n  =  0,  ■  ■  ■ ,  A'  —  1.  A  Parallel  to 
Serial  (P/S)  converter  and  a  channel  decoder  yield  hard  estimates 
of  the  transmitted  information  bits. 

Assuming  a  linear  memoryless  modulation  format  is 
employed,  we  obtain  a  linear  signal  model  for  the  discrete-time 
signals  observed  after  the  bank  of  symbol  rate  samplers  during  the 
n-th  symbol  period 

m  —  1 

x(n )  =  ^  ~  i)  +  g(n)  =  Hs(n)  +  g{n )  (1) 

i=0 

where  x(n)  =  [xi(n),  ...,xt(n)]  is  the  vector  of  observations 
obtained  from  the  bank  of  receivers,  s(n)  =  [si  (n), s/v(n)] 
is  the  n-th  vector  of  transmitted  symbols,  s(n)  =  [s7  (n  — 
m  +  1)  •••  sT(n)]T  is  a  Nm  x  1  vector  containing  the  data 

components  received  during  the  n-th  symbol  period  due  to  the 
ISI,  g(n)  is  a  L  x  1  vector  of  Additive  White  Gaussian  Noise 
(AWGN)  components  with  zero-mean  and  covariance  matrix 
E[g(n)gH  {n)]  —  o'g It  (being  It  the  L  x  L  identity  matrix), 
and  H  =  [H(m  -  1)  •  •  •  17(0)]  is  the  L  x  Nm  matrix 
that  contains  the  discrete-time  channel  coefficients  resulting  from 
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Fig.  1.  Block  diagram  of  the  communication  system. 


symbol-rate  sampling  in  the  described  structure.  In  more  detail, 
let  the  m  x  1  vector  h(p  =  [/i/p(0), ...,  hip(m  -  l)]7'  represent 
the  discrete-time  channel  impulse  response  between  the  p-th 
transmitting  antenna  and  the  Z-th  receiving  antenna  obtained  after 
symbol  rate  sampling,  where  m  is  the  maximum  length  of  the 
impulse  response  and,  as  a  consequence,  the  size  of  the  ISI 
window.  The  L  x  N  building  submatrices  in  the  MIMO  channel, 
H_(i)  i  =  0, ...,  m  —  1,  turn  out  to  be 


K(i) 


hn(i)  hV2(i) 
Zl2l(i)  Zt22  (i) 


hiN(i) 
h2N  ( i ) 


(2) 


hn(i)  h,L2(i)  •••  Zjln(Z) 


The  symbols  of  interest  to  be  estimated  are  those  in  s(n).  In 
order  to  guarantee  that  the  whole  energy  of  this  vector  is  processed, 
let  us  stack  the  observations  from  m  consecutive  symbol  periods 
to  obtain  the  extended  signal  model 

x(n)  =  'Hsm(n)  +  g(n)  (3) 

where  x(n)  =  [xT(n)  ■  ■  ■  xT(n  +  m-  l)]7  is  the  Lm  x  1 
observation  vector,  sm(n)  =  [sr(n  -  m  +  1)  •  •  •  sT(n  + 
m  —  1)] '  is  the  vector  of  contributing  symbols  with  dimensions 
N(2m  -  1)  x  1,  g(n)  =  [ g 1  (n)  •  •  •  £T(n  +  m  -  l)]7  is 
an  AWGN  vector  and  the  extended  channel  matrix  has  the  block- 
diagonal  form 


3.  SELECTION  OF  THE  RECEIVER  COEFFICIENTS 

The  problem  of  selecting  matrix  W  can  be  split  into  N  simpler 
problems,  i.e.,  W  =  [wi,  •  •  • ,  Wj,  •  •  • ,  w»],  j  =  1,  •  •  • ,  TV 
where  Wj  is  the  Lm  x  1  FIR  filter  that  provides  the  estimate 
yj(n)  =  w fx(n)  corresponding  to  the  symbol  from  the  j-th 
transmitting  antenna.  In  order  to  obtain  this  filter’s  coefficients, 
let  us  assume  that  w,  j  is  the  optimum  value  of  the  filter,  meaning 
that  it  removes  both  the  ISI  and  the  MAI,  leaving  only  a  residual 
Gaussian  interference.  Hence,  we  can  write 

Vi  («)  =  x(n)  =  sj  (n)  +  gftj  (n)  (5) 

where  Sj(n)  is  the  desired  symbol  and  gjj(n)  is  a  complex 
Gaussian  random  variable  with  zero  mean  and  variance  a2,  = 

o  h  * 

ffjW.jW,  j .  For  the  sake  of  simplicity,  the  filtered  noise  variance, 
ojj,  will  be  considered  a  known  constant  in  the  subsequent 
derivations,  but  an  easy-to-implement  estimation  algorithm  will  be 
proposed  in  the  next  section. 

When  a  block  of  K  observation  vectors  is  available  at  the  j- 
th  receiver,  and  the  symbols  transmitted  through  the  ji-th  antenna 
are  i.i.d.,  it  can  be  shown  that  the  joint  probability  density  function 
(p.d.f.)  of  the  resulting  soft  estimates,  yj  =  [yj  (0), . . .  ,y3(I<  — 
1)]T,  is  [6]  (asuming  white  filtered  noise) 


Jy ,  ~ 


K  K-l 


n*. 

n=0 


e  "/•i 


(6) 


’  HT(m  -  1) 

0 

0 

HT{m  -  2) 

HT{m  -  1) 

0 

KT{  0) 

KT(  1) 

HT(m-  1) 

KT(  0) 

iLT(  1) 

0 

0 

•••  Ht(  0)  J 

and  dimensions  Lm  x  N(2m  —  1). 

An  N  x  1  vector  of  soft  estimates, 
y (n)  =  [yi  (n) ,  •  •  • ,  yN  (n)]T,  corresponding  to  the  symbols  in 
s(n),  is  obtained  through  linear  processing  as 

y(n)  =  W  Hx.(n)  (4) 

where  W  is  an  Lm  x  N  matrix  filter  and  H  denotes  Hermitian 
transposition. 


where  Es[  ]  denotes  the  statistical  expectation  with  respect  to 
(w.r.t.)  the  desired  symbol.  Since  the  symbols  belong  to  a  finite 
alphabet  of  .4  elements,  this  expectation  can  be  easily  converted 
into  an  addition  of  A  terms.  Using  (6),  the  ML  estimate  of  w,  j 
turns  out  to  be 


w j  =  arg  max  < 

r  k- i 

=  ^2 log  Es 

\Vj(n)-s\2 

w*.i 

[  n~  0 

(7) 

where  £(w,,, )  is  the  log-likelihood  of  w7  w.r.t.  the  observed 
soft-estimates  y3.  Since  the  linear  filters,  w_,  j  =  1,  are 
chosen  so  as  to  ensure  that  the  p.d.f.  of  y(n)  is  the  desired 
one,  the  proposed  criterion  implicitly  exploits  any  spatio-temporal 
structure  created  among  the  symbol  substreams  in  order  to  remove 
the  interferences. 

Nevertheless,  all  the  transmitting  antennae  use  the  same 
modulation  format,  and.  as  a  consequence,  solving  the 
optimization  problem  (7)  may  lead  to  the  capture  of  the  j- th 
receiver  by  an  interference,  i.e.,  the  estimation  of  a  non  desired 
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sequence,  s,^j(ii).  This  limitation  can  be  easily  circumvented  in 
practice  with  the  transmission  of  a  short  training  sequence  of  M  < 
I<  symbols  from  each  antenna.  Conditioning  the  expectation  in 
(7)  to  s j,t  =  [.Sj (0), . . .  ,Sj(M  -  1)]T,  we  arrive  at  a  semiblind 
receiver  where  the  filter  coefficients  are  computed  as 

M  —  1 

£(w*.j)Isj ./  =  -  _ 

n=0 

A'-l 

+  ^2  1°S  Es 

n=AI 

Wj  =  arg  max{£(w,j)|s(} .  (8) 


lv;<">  —  I- 

’h 


Fig.  2.  MSE  for  several  values  of  the  SNR.  Simulation  parameters: 
N  =  3.  L  =  4, 777  =  2. 


The  quadratic  term  in  (8)  reshapes  the  log-likelihood  function  by 
enhancing  the  local  maximum  corresponding  to  the  desired  j-th 
symbol  stream  and  progressively  removing  (as  M  increases)  the 
other  non  desired  maxima. 

It  should  be  remarked  that  the  semiblind  criterion  (8)  is 
derived  from  a  purely  statistical  point  of  view,  whereas  most 
semiblind  criteria  proposed  so  far  are  obtained  in  a  rather  heuristic 
manner  by  regularizing  the  Least  Squares  (LS)  cost  function  for 
the  training  data  using  a  different  blind  cost  function  [3]. 


4.  ITERATIVE  IMPLEMENTATION 

Since  it  is  not  possible  to  find  a  closed  form  solution  to 
problem  (8),  we  resort  to  the  EM  algorithm  [4]  as  a  numerical 
optimization  approach.  Let  the  j-th  sequence  of  soft  estimates. 
{.'//  (n)}„=o  K- _  ] ,  be  the  observed  or  incomplete  data  and  let  the 
j-th  stream  of  symbols,  {.Sj(?t)}n=0  K_,.  be  the  hidden  data, 

according  to  the  usual  EM  notation.  Hence,  the  complete  data 
are  given  by  the  sequence  {t/j(7?.),Sj(n)}J1=0  K_,  and  taking 

similar  steps  as  in  the  standard  derivation  of  the  EM  algorithm  [4] 
for  the  p.d.f.  (6),  we  obtain  the  following  iterative  algorithm 

A/  —  1 

Estep:  U  (w,w j(i))  =  -  ^  \tjj(n)  -  Sj(n)  |2 

71  =  0 

K  —  1 

~  ^  ]  Es,(n  )|l/j  (n);Wj  (i)  [|?/j  (n)  ~  j 

n=M 

M  step:  w_,  (i  +  1)  =  arg  max{t/(w,  Wj(i))} 

W 

where  ^(n)|i/j(n);wi(i)[-]  denotes  statistical  expectation  w.r.t.  to 
symbol  Sj  (n)  conditioned  upon  the  corresponding  soft  estimate, 
yj{n),  and  with  parameter  vector  w j(i).  Since  function  {/(•,  •)  is 
purely  quadratic,  it  presents  a  single  maximum  that  can  be  found 
analytically  and  it  is  possible  to  rewrite  the  above  iteration  as  the 
single  updating  rule  in  eq.  (9)  shown  at  the  top  of  next  page.  It  can 
be  proved  by  means  of  standard  EM  theory  [4]  that  the  sequence 
of  filter  updates  obtained  via  (9)  is  non-decreasing  in  likelihood. 
Notice,  also,  that  algorithm  (9)  reduces  to  the  closed-form  LS 
solution  when  M  =  K. 

For  the  practical  application  of  the  iterative  EM  algorithm, 
the  conditional  expectation  in  (9)  must  be  evaluated.  This  can  be 


Fig.  3.  MSE  for  several  values  of  the  SNR.  Simulation  parameters: 
N  =  C.  L  =  6,777 .  =  1. 


accomplished  by  means  of  the  Bayes  theorem  as 

_  |yj(n  )~5j  (h  )|2 

e  "i-i  s‘  (ti) 

£'sj(n)\«j(n):*jV)  L®j  WJ  =  r  |  |  -  "1  ' 

- 1 - - 

Ea.  e  ’i-t 

(10) 

The  expresion  in  the  right-hand  side  of  ( 10)  depends  on  the  filtered 
noise  variance  o] ;  which,  in  turn,  is  a  function  of  the  filter 
coefficients.  A  simple  updating  rule  for  this  parameter  is 

=  <7gwf(i)wj(i)  (11) 

where  the  input  AWGN  variance,  o^,  (or,  equivalently,  the  power 
spectral  density  of  the  channel  noise)  is  assumed  to  be  known  a 
priori. 


5.  COMPUTER  SIMULATIONS 

In  this  section,  we  present  computer  simulations  to  illustrate  the 
performance  of  the  proposed  semiblind  approach.  As  a  figure  of 
merit  for  soft-detection,  we  have  chosen  the  Mean  Squared  Error 
(MSE)  of  the  estimates,  defined  as 

MSE  =  Trace  ((y (n)  -  s{n))H  (y{n)  -  s{n)))  (12) 
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Wj(i  +  1)  =  E  x(n)xH(n)  J  E*«  s*(n)+  £  •®*j(»)l#j(n);Wj(i)  [sj  (n)]  X(M) 


(9) 


Fig.  4.  Convergence  rate  of  the  EM  algorithm.  Simulation 
parameters:  N  =  3,  L  =  4,  m  =  2,  SNR=12  dB. 


Number  of  iterations 

Fig.  5.  Convergence  rate  of  the  EM  algorithm.  Simulation 
parameters:  N  =  6,  L  =  6,  m  =  1,  SNR=I2  dB. 


where  Trace (•)  denotes  the  matrix  trace  operator.  The  MSE  has 
been  measured  for  different  channels  and  different  values  of  the 
average  Signal  to  Noise  Ratio  (SNR)  after  sampling,  given  by 

SNR  =  101og10  CTgTra^HH  K  where  a2s  =  ESj{n)  [|sj(n)|2] 
Vj,  n. 

Figure  2  shows  the  MSE  for  several  SNR  values  achieved 
in  a  system  with  N  =  3  antennae,  QPSK-modulated  symbols, 
L  =  4  receiving  antennae  and  a  maximum  length  of  the  discrete¬ 
time  channel  impulse  response  m  =  2.  The  channel  coefficients 
in  matrix  H  are  modelled  as  i.i.d.  complex  Gaussian  random 
variables  with  zero-mean  and  standard  deviation  ah  =  0.5.  The 
results  plotted  in  the  figure  have  been  obtained  by  averaging  the 
performance  over  20  independent  realizations  of  the  whole  matrix 
H.  We  have  considered  that  transmission  is  carried  out  in  bursts 
of  K  =  100  symbols  per  antenna,  with  training  sequence  length 
M  =  10.  It  is  apparent  that  the  proposed  semiblind  approach 
performs  close  to  the  theoretical  Minimum  Mean  Square  Error 
(MMSE)  limit.  This  is  the  performance  limit  that  would  be 
achieved  by  a  linear  MMSE  detector  constructed  with  perfect 
knowledge  of  the  channel.  We  have  also  plotted  the  MSE  achieved 
by  a  practical  supervised  MMSE  soft  detector  implemented  using 


the  Recursive  Least  Squares  (RLS)  algorithm  [7]  that  is  run  for 
the  training  sequences  j  =  1,  -  •  • ,  N,  in  order  to  compute 
the  filter  coefficients.  It  can  be  seen  that  the  performance  of  this 
practical  receiver  is  considerably  worse  than  the  theoretical  one 
because  of  the  insufficient  length  of  the  training  sequences. 

These  results  are  fully  corroborated  by  an  analogous 
simulation  experiment  carried  out  for  a  system  with  N  =  6 
transmitting  antennae,  L  —  6  receiving  antennae  and  channel 
length  m  =  1  (no  ISI).  The  resulting  curves  are  plotted  in  figure 
3. 

Finally,  figures  4  and  5  illustrate  the  convergence  rate  of  the 
EM  algorithm  for  the  two  systems  considered  before.  Very  few 
iterations  are  enough  to  attain  MSE  convergence,  which  is  an 
important  advantage  if  real-time  constraints  have  to  be  fulfilled. 

6.  CONCLUSIONS 

We  have  presented  a  novel  semiblind  approach  to  space-time 
linear  detection  in  wireless  communication  systems.  A  ML- 
based  semiblind  criterion  is  applied  for  the  selection  of  the  linear 
receiver  coefficients,  which  are  numerically  computed  by  means 
of  a  fast  iterative  EM  algorithm.  Unlike  other  semiblind  criteria, 
the  proposed  method  is  derived  from  a  purely  statistical  point  of 
view.  Training  data  reveal  themselves  as  extremely  useful  to  avoid 
the  typical  misconvergence  problems  of  ML  methods. 
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ABSTRACT 

Mobile  radio  communication  systems  are  generally  designed 
without  taking  into  account  the  relative  emitter/receiver  dy¬ 
namics.  In  this  paper  we  model  this  dynamics  as  a  vec¬ 
tor  Markov  process  and  formulate  ranging  and  digital  de¬ 
modulation/detection  as  aspects  of  recursive  absolute  (not 
modulo  27t)  phase  estimation.  Symbol-by-symbol  detection 
and  phase  tracking  within  symbol  interval  are  performed 
by  a  bank  of  ‘matched"  stochastic  nonlinear  estimators  and 
a  maximum  a  posteriori  (MAP)  decision  algorithm.  The 
approach  applies  to  precision  landing  and  communication 
with  Low  Earth  Orbit  (LEO)  satellites  or  between  rapid  ma¬ 
neuvering  platforms. 

1.  INTRODUCTION 

Mobile  radio  communication  systems  are  generally  designed 
without  taking  into  account  the  relative  emitter/receiver  dy¬ 
namics.  Doppler  and  Doppler  rate  estimates,  necessary  to 
cope  with  accelerative  trajectories,  are  generally  obtained 
with  maximum  likelihood  (see  [1]  and  references  therein). 
In  reference  [2]  we  considered  the  problem  of  carrier  track¬ 
ing  and  symbol  detection  in  Additive  White  Gaussian  Noise 
(AWGN);  phase  dynamics  is  modelled  as  a  vector  linear 
Markov  process,  of  which  only  the  first  component  is  ob¬ 
served.  As  in  [3],  symbol-by-symbol  detection  and  phase 
tracking  within  symbol  interval  are  performed  by  a  bank  of 
‘matched"  stochastic  nonlinear  estimators  and  a  maximum 
a  posteriori  (MAP)  decision  algorithm. 

The  results  reported  in  [2]  were  limited  to  scalar  phase 
dynamics,  namely  Brownian  motion,  and  constrained  to  the 
interval  [ — 7r  7r[.  In  this  paper  we  formulate  ranging  and  dig¬ 
ital  demodulation/detection  as  aspects  of  recursive  absolute 
(not  modulo  27r)  phase  estimation.  In  doing  this,  we  build 
on  previous  experience  on  this  problem,  see  [4]  [5],  The  pro- 
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posed  approach  applies  to  precision  landing  and  communi¬ 
cation  with  Low  Earth  Orbit  (LEO)  satellites  or  between 
rapid  maneuvering  platforms,  among  others. 

2.  MODELLING  ASSUMPTIONS 

Consider  mobile  radio  communications  in  AWGN  channel 
and  digital  phase  modulation.  The  received  signal  is  z(t)  = 
cos(uo  t  +  8(f))  +  v(t)  (the  carrier  known  amplitude  is  nor¬ 
malized  to  one),  where  uj0  is  the  nominal  carrier  frequency 
(wavelength  A  =  27tc/u/o)-  and  v(t)  is  white  Gaussian  noise 
with  spectral  density  No/2.  The  phase  process  9(t)  is  the 
sum  of  the  digital  information  process  y(t.).  and  the  dynam¬ 
ics  phase  ,xi  ((),  which  takes  into  account  the  Doppler  phase 
shift  due  to  relative  emitter/receiver  motion,  and  also  oscil¬ 
lator  phase  drifts. 

We  are  interested  in  applications  where  phase  Xi  (t.)  = 
2n R(t)/\  (proportional  to  range  f?(<)),  varies  significantly 
within  digital  symbol  interval.  We  describe  this  process 
x\  (t)  as  the  first  component  of  a  vector  Markov  process 
x(f)  €  1Z 3  modelled  by 

x(f)  =  Arx(f)  +  Hcu(t)  (1) 

where  u(t)  is  white  Gaussian  driving  noise  with  spectral 
density  qr.  The  components  of  vector  x(()  are  proportional 
to  range,  X](f)),  velocity,  xoit),  and  acceleration,  xa(t). 
Prior  knowledge  or  side  information  about  this  dynamics 
is  inserted  in  terms  of  matrices  Ac  and  Br,  noise  variance 
qc,  and  initial  condition  p(x(t.o)). 

The  received  signal  z(t )  is  down  converted  to  baseband 
with  reference  to  a  local  oscillator  of  nominal  frequency  w0- 
The  sampled  (normalized  integration-and-dump)  in  phase 
and  quadrature  components  form  the  observation  vector 

z„  =  [cos(0„)  sin(0„)]T+  [ui,„  v2,n]T , 

n  =  1,2,---  (2) 
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where  vi.n  and  v2,n  are  zero  mean  mutually  independent 
white  Gaussian  sequences  with  variance  r  =  rc/A  ( rc  = 
No).  The  sampling  interval  A  must  be  small  enough  to 
guarantee  that  both  discrete  and  continuous  models  describe 
essentially  the  same  process.  For  implementation  and  sim¬ 
ulation  purposes  we  adopt  the  discrete  version  of  (1), 

x„+i  =  Ax„  +  Bun,  n  =  1,2,  •  •  -  A  (3) 

where  A  =  I  +  ACA,  B  =  BCA,  and  un  is  a  zero  mean 
white  Gaussian  sequence  with  variance  q  =  qc/ A. 
Signaling  y(t)  will  be  presented  in  section  4. 

3.  RANGING 

Optimal  estimation  of  xn  involves  the  propagation  of  the 
probability  density  function  Fn  =  P(x„ \Zn),  filtering  den¬ 
sity,  conditioned  on  the  set  of  past  and  present  observations 
Z„  =  {z1;  ■  ■  ■  ,  zn}.  This  requires  recursive  application  of 
Chapman-Kolmogorov  equation  and  Bayes  law 

Prediction:  Pn  =  Sn  *  Fn-i  (4) 

Filtering:  F„  =  CnH„Pn  (5) 

where  *  denotes  convolution,  and  C„  is  a  normalizing  fac¬ 
tor;  the  convolution  kernel  Sn  =  P(xn+1|xn),  which  ex¬ 
presses  the  process  dynamics  (3),  is  Gaussian  given  by 

SnocAf  (xn+i  -  Axn,BqBT)  (6) 

where  Af(u,  V)  =  exp  (-(l/2)uTV~1BT).  The  proba¬ 
bility  density  function  Hn  (observation  factor)  is,  according 
to  model  (2),  given  by 

Hn  oc  exp  ^An  cos(a;i,n  -  Vo"))  (7) 

with 

An  =  l(/zln+zln’  Vo"  =  arctan  Tfif-  (8) 

To  implement  (4)(5)  we  need  finite  representations  of  the  in¬ 
volved  probability  density  functions.  This  concerns,  in  this 
problem,  the  periodic  function  Hn.  As  in  [4],  we  represent 
Hnby 

OO 

Hn  oc  ^2  Af  (x!,n  -  r)?n  , 

i=—oo 

Vi"  =  Vo"  +  2  m  (9) 

where  oHn  is  obtained  according  to  a  minimum  Kullback 
distance  criterion. 


3.1.  Tracking 

Consider  a  prediction  density  Pn  oc  Af  (xn  —  r/ p" ,  Vp"  j , 
and  assume  that  only  the  mode  of  Hn  closest  to  if'1  con¬ 
tributes  significantly  to  the  product  (4).  The  filtering  density 
Fn  will  be  Gaussian,  with  mean  r]Fn  and  covariance  matrix 
\Fn .  The  optimal  estimate  is  then  given  by  xn  =  rfn . 

Fig.  1  shows  the  evolution  of  phase  (range),  phase  rate 
and  acceleration,  generating  phase  as  a  double  integrated 
Brownian  motion  with  driving  noise  variance  qc  =  50  rad  s~6 
(which  is  chosen  to  encompass  the  dynamics  of  a  typical 
LEO  satellite).  Also  shown  are  the  estimates  obtained  by 
the  filter  in  tracking  conditions  and  with  perfect  matched 
parameters  and  observation  noise  variance  r  =  0.3962  rad2. 


x  I04 


Iteration  n/10 


Fig.  1.  Phase,  phase  rate  and  acceleration  trajectories  and 
their  corresponding  estimates  ( r  =  0.3962  rad2  and  qc  = 
50  rad2  s-6). 


3.2.  Acquisition 

In  the  preceding  example  the  estimates  were  initialized  at 
their  nominal  values.  In  general  this  is  not  exactly  known 
and  tracking  has  to  be  preceded  by  an  acquisition  period 
which,  due  to  the  multi-modal  filtering  density  function  in¬ 
duced  by  the  sensor  factor  representation  (9),  corresponds  to 
a  phase  ambiguity  resolution.  We  apply  to  this  problem  the 
methodology  developed  in  [4],  Fig.  2  illustrates  this  mech¬ 
anism  with  scalar  dynamics  x(t)  =  acx(t)  +  u(t)  (phase 
rate  proportional  to  absolute  phase).  Starting  with  a  multi¬ 
modal  density,  the  filter  converges  recursively  to  an  essen- 
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tially  uni-modal  shape.  The  acquisition  can  be  formalized 
by  introducing  an  internal  measure  of  dispersion;  the  first 
passage  moment  of  this  measure  across  a  given  threshold 
defines  the  acquisition  time. 


Fig.  2.  Absolute  phase  acquisition  =  ax„  +  un  with 
q  =  0.01,  r  =  0.5,  a  =  0.99). 


4.  MOBILE  COMMUNICATION 

We  now  consider  signaling  y(t)  as  the  transient  evolution  of 
the  first  order  linear  dynamical  system 

y(t)  =  -0y(t),  te[kTs,(k  +  l)Ts[  (10) 

randomly  initialized  at  the  beginning  of  each  symbol  inter¬ 
val,  [kTs,  ( k  +  l)Ts[ ,  k  =  0, 1,  •  •  • ,  according  to  the  prob¬ 
ability  density  function 

M  i  /  d  \ 

P(y(kTs))  =Y,m6  ( y (*T»)  -  2qU))  ’  (11) 

j  =  l,--  -  ,M 

where  M  is  the  number  of  distinct  equiprobable  symbols, 
a(i)  =  2 j  -  M  -  1,  and  a  -  (3TS  and  d  are  modula¬ 
tion  parameters.  Since  the  phase  trajectories  are  not  re¬ 
stricted  to  a  2tt  interval,  we  call  this  modulation  scheme 
M -ary  absolute  phase  modulation  (M-APM).  When  a  —>  0, 
two  well-known  digital  schemes  are  produced:  M-PSK,  for 
d  —  27 r/M,  and  orthogonal  M-FSK,  for  ad  =  it.  Con¬ 
tinuous  phase  modulation  schemes  can  also  be  obtained  by 
adjusting  parameters  a  and  d  [3], 

We  adopt  the  discrete  version  of  (10) 

yn+ 1  =  (1  - /?A)?/n,  n  =  l,2,---JV  (12) 


where  N  =  Ts/A  is  the  number  of  samples  per  symbol 
interval. 

The  main  task  of  the  receiver  is  to  acquire  and  track 
the  dynamics  process  x„  (which  provides  a  range  solution). 
While  tracking,  it  must  decide,  at  the  end  of  each  symbol 
interval,  which  symbol  was  sent.  We  assume  perfect  sym¬ 
bol  timing.  Like  in  [2][3]  the  receiver  is  a  parallel  of  M 
'matched'  nonlinear  filters,  each  one  preceded  by  a  phase 
rotation  block  that  eliminates  the  contribution  of  y„  from 
the  observation  vector  z„ .  The  detector  computes  the  weights 
associated  with  each  filter  block  and  decides  according  to  a 
MAP  criterion.  The  prediction  density  P*./v  of  the  selected 
filter  is  used  to  set  the  initial  condition  of  all  filters  to  the 
next  symbol  interval.  This  corresponds  to  a  symbol  aided 
decision  criterion. 

4.1.  LEO  satellite  example 

Consider  communication  between  an  Earth  station  and  a 
LEO  satellite  describing  a  circular  orbit  with  an  altitude  of 
780  Km  [6];  the  emitter  and  the  receiver  are  both  in  the 
equatorial  plane  and  the  carrier  nominal  frequency  /0  = 
1.6  GHz.  Fig.  3  shows  phase,  phase  rate  and  accelera¬ 
tion.  during  the  entire  visibility  window  (11.1  minutes)  as¬ 
suming  a  minimum  elevation  angle  of  8.2°.  Consider 


Fig.  3.  Phase,  phase  rate  and  acceleration  for  a  LEO  satellite 
trajectory  along  the  visibility  window. 

also  quaternary  phase  modulation  ( M  =  4)  with  signal¬ 
ing  parameters  a  =  1,  d  =  5.7  [2],  bit  rate  2400  bit/s, 
(Ts  =  1/1200  s),  N  =  10  samples  per  symbol  (A  = 
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Symbol  index 

Fig.  4.  Phase,  phase  rate  and  acceleration  and  their  esti¬ 
mates  for  a  LEO  satellite  trajectory  (Eh /No  =  8  dB  and 
qc  =  50  rad2  s-6). 

Ts/N  —  83  ps),  bit  signal-to-noise  ratio  Eb /No  =  8  dB, 
and  rc  =  Ts/(2Eb/N0  log2  M). 

Fig.  4  shows  the  satellite  tracking  ability,  modelling  phase 
as  a  double  integrated  Brownian  motion  with  driving  noise 
variance  qc  =  50  rad  s-6  as  in  Fig.  1.  From  12000  transmit¬ 
ted  symbols,  corresponding  to  a  time  horizon  of  10  seconds, 
only  9  symbols  were  detected  in  error.  One  of  these  situa¬ 
tions  can  be  seen  in  Fig.  5,  where  9n  =  xi>n  +  yn.  Notice 
the  large  phase  variation  along  each  symbol  and  the  receiver 
recovering  capacity  after  the  false  detection  of  symbol  2153. 


5.  CONCLUDING  REMARKS 

The  proposed  receiver  is  a  parallel  open-loop  structure  suited 
for  DSP-based  implementation.  These  allows  to  implement 
advanced  algorithms  required  to  optimally  integrate  all  the 
available  information.  This  was  already  the  perspective  of 
reference  [7]  in  the  beginning  of  the  seventies,  and  the  to¬ 
days  concept  of  software  radios. 
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ABSTRACT 

In  this  paper,  we  address  the  problem  of  channel  fad¬ 
ing  in  communication  systems.  In  particular,  we  fo¬ 
cus  on  the  flat  fading  phenomenon.  We  study  some 
time-frequency  based  techniques  for  the  detection  of  fre¬ 
quency  modulated  signals  subjected  to  flat  fading  chan¬ 
nels.  A  comparison,  based  on  bit  error  rate,  of  these 
techniques  is  also  presented. 


1.  INTRODUCTION 

In  a  wireless  mobile  communication  system,  a  transmit¬ 
ted  signal  may  experience  random  changes  in  its  ampli¬ 
tude,  phase  and  angle  of  arrival.  These  changes,  re¬ 
ferred  to  as  fading,  can  be  caused  by  multiple  paths 
between  the  transmitter  and  receiver  and/or  by  motion 
between  the  receiver  and  transmitter  [1],  If  the  multi¬ 
ple  paths  are  large  in  number  and  there  is  no  line  of 
sight  signal  component  (no  dominant  component),  the 
envelope  of  the  received  signal  is  statistically  described 
by  a  Rayleigh  probability  density  function  [2]. 

Multipath  fading  results  in  two  major  degradation: 
frequency  selective  fading  and  frequency  non-select.ive 
(or  flat)  fading  [3].  Several  techniques  are  available  to 
combat  fading  [4]. 

In  this  paper,  we  focus  on  frequency  shift  keying 
(FSK)  modulated  signals  transmitted  through  a  channel 
subjected  to  flat  fading.  We  review  some  time-frequency 
techniques  used  to  retrieve  such  signals  in  a  noisy  envi¬ 
ronment.  Also,  we  evaluate  these  techniques  in  terms  of 
bit-error  rate  and  compare  them  to  standard  methods 
used  in  telecommunications. 
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2.  PRELIMINARIES 

It  can  be  shown  that  for  a  complex  signal  z(t),  trans¬ 
mitted  over  a  flat  fading  channel,  the  received  complex 
signal  is  given  by  [3] 

y{t)  =  m(t)  ■  z(t)  (1) 

where  m(t)  is  a  complex  Gaussian  process.  This  chan¬ 
nel  has  also  been  called  a  multiplicative  fading  channel. 
The  above  equation  indicates  that  the  original  signal 
gets  corrupted  by  a  process  m(t.)  whose  amplitude  can 
be  modeled  by  a  Rayleigh  density  function  while  its 
phase  is  uniformly  distributed. 

The  dramatic  drop  in  power  of  the  signal,  due  to 
fading,  makes  it  very  difficult  to  be  detected.  In  some 
situations,  such  as  FSK  modulated  signals,  the  signal 
information  is  contained  in  its  instantaneous  frequency 
(IF).  Thus,  by  using  an  appropriate  tool  to  estimate 
the  IF  of  the  modulated  signal,  we  may  be  able  to  re¬ 
trieve  the  signal  without  having  to  use  expensive  and 
very  complex  receivers.  In  this  paper,  we  review  some 
time-frequency  techniques  in  order  to  detect  the  origi¬ 
nal  transmitted  signal. 

The  field  of  time-frequency  signal  analysis  is  one  of 
the  recent  developments  which  provides  suitable  tools 
for  analysing  non-stationary  signals,  characterised  by  a 
time- varying  spectral  contents,  occurring  in  many  fields 
of  engineering  [5].  Time-frequency  distributions  (TFDs) 
are  natural  extensions  of  the  Fourier  transform.  They 
map  a  one  dimensional  signal,  function  of  time  only,  to 
a  two  dimensional  quantity,  function  of  time  and  fre¬ 
quency.  One  of  the  most  popular  TFD  is  the  Wigncr- 
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Ville  distribution  (WVD)  defined  as  [5] 
r+°°  t  t 

W(t,  f)  =  j  [z(t  -)}e-^dr  (2) 


where  z{t)  is  the  analytic  form  of  the  real- valued  signal 
under  investigation.  The  WVD  first  moment  yields  the 
IF  of  the  analysed  signal  [6],  defined  as  [7] 


/»(*)  = 


1  d(f>(t) 
2n  dt 


where  <j>(t)  is  the  phase  of  the  signal  z(t). 


(3) 


In  practice,  and  considering  orthogonal  binary  FSK 
(BFSK),  the  original  real- valued  signal  is  generated  as 


s (t)  =  y  cos (u0t  +  d(t)  Cl  t ) 

where  u0  is  the  carrier  (angular)  frequency,  Cl  a  constant 
offset,  d(t)  =  1  or  -1  (depending  on  whether  the  bit  1  or 
0  has  been  transmitted),  Es  is  the  symbol  energy  and 
T  is  the  signaling  period.  At  the  receiver,  we  have 


r(t )  =  a 


cos(wq£  +  d(t)  Cl  t  +  <j>)  +  n(t) 


(4) 


where  a  is  the  fading  coefficient  assumed  to  have  a 
Rayleigh  distribution,  0  is  a  random  phase  uniformly 
distributed  over  [0, 2it\. 

For  a  flat  fading  channel  a  and  0  are  assumed  to  be 
constant  over  one  signaling  period  T  and  the  additive 
noise  n(t)  is  assumed  to  be  zero-mean  white  Gaussian 
with  a  variance  equal  to  a2  =  Nq/2.  In  this  case,  it  can 
be  shown  that  for  an  envelope  or  a  square  wave  detector 
the  bit  error  rate  (BER)  for  a  non-coherent  detection  is 
given  by  [8] 


Pe  = 


1 

2  +  7 


(5) 


and  for  a  coherent  detection  it  is  [8] 


with 


T  =  (6) 


with  E[]  being  the  expectation  operator. 


3.  PROPOSED  IF  ESTIMATORS 


of  the  distribution  for  every  time  instant.  The  WVD 
performance  can  be  shown  to  degrade  significantly  at 
low  signal-to-noise  ratio  (SNR). 

In  order  to  improve  the  statistical  performance  of 
the  signal  detection,  one  can  use  the  B-distribution. 
The  B-distribution  is  defined  as  [9] 

/*  /*  +  00 

WB(t,f)  =  J  j  G{t',T)[zr{t-t'+T-)-z;{t-t' -T-)} 
x  e~j2*fTdt'dT  (7) 

where  G(t,  r)  is  a  function  given  by 

\cosh(f)  J 

and  a  is  a  real  parameter.  We  see  that  the  B-distribution 
is  similar  to  the  WVD  but  instead  of  taking  the  FT  of 
the  product,  we  must  first  convolve  (in  the  time  variable 
t)  the  product  [zr(t  +  |)  •  z*(t  -  §)]  with  the  function 
G(t,r)  and  then  take  the  FT  of  the  result.  In  order 
to  estimate  the  IF  of  the  analysed  signal,  we  can  use 
the  peak  of  the  B-distribution  to  obtain  it.  As  a  quick 
qualitative  comparison,  consider  the  WVD  and  the  B- 
distribution  of  a  sinusoid  in  additive  white  Gaussian 
noise  with  0  dB  SNR.  These  two  distributions  are  dis¬ 
played  in  Figures  1  and  2  respectively.  Observe  the  su¬ 
periority  of  the  B-distribution  in  suppressing  the  noise. 


Fs=1Hz  N=51 1 
Time-res=1 
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Figure  1:  The  WVD  of  a  sinusoid  in  0  dB  noise. 


The  received  signal  r(t),  over  one  signaling  period  is  as¬ 
sumed  to  be  a  constant  amplitude  sinusoid.  The  WVD, 
defined  above,  can  be  used  to  estimate  the  frequency 
of  r(t).  This  can  be  done  by  first,  evaluating  the  time- 
frequency  distribution  of  the  received  signal  (or  its  an¬ 
alytic  version)  and,  then,  searching  for  the  maximum 


In  figures  3  and  4,  we  plot  the  IF  estimates,  of  a 
sinusoid  (normalised  frequency=0.25)  embedded  in  -4 
dB  noise,  using  the  WVD  and  the  B-distribution  re¬ 
spectively.  Once  again,  we  can  observe  from  these  plots 
that  the  B-distribution  gives  a  better  result  compared 
to  the  WVD. 
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Fs=1  Hz  N=511 
Tlme-re  s=1 


Figure  2:  The  B-distribution  of  a  sinusoid  in  0  dB  noise. 


Figure  3:  WVD  based  IF  estimate  of  a  sinusoid  in  -4 
dB  noise. 
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Figure  4:  B-distribution  based  IF  estimate  of  a  sinusoid 
in  -4  dB  noise. 


In  what  follows,  we  will  use  the  above  time- frequency 
distributions,  namely,  the  WVD  and  the  B-distribution 


in  order  to  retrieve  the  IF  of  a  signal  transmitted  over  a 
flat  fading  channel.  We  will  compare  the  performance  of 
these  distributions  in  terms  of  their  respective  bit  error 
rates. 


4.  COMPARISONS 

As  stated  above,  in  the  comparisons,  we  limit  our  dis¬ 
cussion  to  a  flat  fading  channel  only.  We  can  have  two 
cases:  (i)  the  transmitted  signals  are  totally  unknown 
(exept  that  they  are  sinusoids)  and  (ii)  the  signals  are 
known  but  we  don’t  know  which  one  was  sent.  For  the 
first  case,  unknown  frequencies  of  the  transmit  signals, 
we  apply  the  time-frequency  distributions  directly  on 
the  received  signal  in  order  to  decide  which  frequency 
is  present.  For  the  second  case,  the  transmitted  signals 
are  known  and  we  can  incorporate  this  information  in 
the  time-frequency  distribution  in  order  to  decide  which 
frequency  is  present  in  the  received  signal. 

Let  us  first  consider  the  case  of  totally  unknown 
transmitted  signals.  For  that,  we  generate  two  orthog¬ 
onal  sinusoids  .s0(f)  and  S\(t.).  To  account  for  the  flat 
fading  channel,  we  multiply  each  of  these  two  signals 
by  a  (a  value  taken  from  a  Rayleigh  distribution  such 
that  E(a'2)  =  1).  An  initial  random  phase  <j>  as  well 
as  some  zero-mean  Gaussian  noise  n(t)  are  added  to 
the  signals,  as  suggested  by  Equation  4.  The  signal  r(t.) 
(more  precisely  its  analytic  form)  is  then  analysed  using 
the  time-frequency  distributions  and  the  peaks  of  these 
distributions  will  yield  the  corresponding  frequency  of 
the  received  signal.  Based  on  this  frequency,  we  decide 
which  symbol  (s0(t)  or  $\{t))  was  sent  in  that  partic¬ 
ular  symbol  interval.  When  the  noise  power  increases, 
we  tend  to  make  more  errors  in  our  decision.  Since  we 
have  a  binary  modulation,  the  number  of  errors  divided 
by  the  total  number  of  transmitted  symbols  constitutes 
the  bit  error  rate.  Figure  5  displays  the  BER.  versus  the 
energy-to-noise  ratio  7  (in  dB)  for  each  time-frequency 
distribution.  For  comparison  purposes,  we  have  also 
analysed  the  transmitted  signal  using  the  periodogram. 
It  is  seen  that  the  performance  of  the  B-distribution  is 
close  to  that  of  the  periodogram  (which  is  the  optimal 
detector  for  a  sinusoid).  Note  that  since  the  transmitted 
signal  is  just  a  sinusoid,  we  expect  a  constant  frequency 
over  the  whole  symbol  interval  in  the  time-frequency 
plane  (see  for  instance  Figure  2).  Thus,  we  average  the 
time-frequency  distribution  (over  time)  and  then  search 
for  the  maximum  to  obtain  an  estimate  of  the  transmit¬ 
ted  frequency. 

Now,  we  consider  that  we  know  the  signals  to  be 
transmitted  but  we  don’t  know  which  one  has  been 
transmitted  at  the  particular  time  interval  (symbol  in¬ 
terval)  of  interest.  In  this  case,  we  can  incorporate  this 
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Figure  5:  Performance  of  the  various  time-frequency 
distributions  for  a  Rayleigh  fading  channel  when  there 
is  no  knowledge  of  the  transmitted  signals. 

information  in  our  time-frequency  distributions  in  the 
detection  process.  An  analytic  version  of  the  received 
signal  r(t)  is  multiplied  by  the  analytic  version  of  s0(t) 
and  the  analytic  version  of  sj  (t)  respectively.  Using  the 
B-distribution  or  the  WVD  on  the  product,  we  can  eas¬ 
ily  know  the  frequency  present  in  r(t).  The  BER  of  the 
time-frequency  techniques,  along  with  that  of  the  pe- 
riodogram,  for  this  detection  procedure  are  plotted  in 
Figure  6. 


Figure  6:  Performance  of  the  various  time-frequency 
distributions  for  a  Rayleigh  fading  channel  using  the 
knowledge  about  the  transmitted  signals. 

In  our  present  situation,  we  see  that  the  gain  in  using 
the  knowledge  of  the  transmitted  signals  in  the  detec¬ 
tion  does  not  improve  significantly  the  performance  of 
the  detector. 

We  should  note  that,  the  time-frequency  techniques 
are  very  robust  and  can  be  applied  to  other  FM  signals 
such  as  an  M-ary  FSK. 


We  also  note  that  the  proposed  detector  can  be  ap¬ 
plied  in  global  system  for  mobile  communications  (GSM) 
because  this  system  uses  Gaussian  minimum  shift  key¬ 
ing  (GMSK)  modulation  which  can  be  non-coherently 
detected  as  simple  FSK  [2]. 

5.  CONCLUSION 

In  this  paper,  we  addressed  the  problem  of  retrieving 
FM  signals  transmitted  over  a  flat  or  Rayleigh  fading 
channel.  In  particular,  we  have  applied  some  time- 
frequency  tools  to  analyse  the  received  signal.  We  have 
seen  that  these  tools  can  be  used  whether  we  know  the 
transmitted  signals  or  not.  A  comparison,  based  on  the 
bit-error  rate,  between  these  techniques  has  also  been 
presented.  The  results  show  that  the  B-distribution 
gives  a  better  detection  performance  compared  to  the 
WVD. 

6.  REFERENCES 

[1]  B.  Sklar.  Rayleigh  Fading  Channels  in  Mobile  Dig¬ 
ital  Communication  Systems  Part  I:  Characterisa¬ 
tion.  IEEE  Communications  Magazine,  pages  90- 
100,  1997. 

[2]  T.S.  Rappaport.  Wireless  Communications. 
Prentice-Hall,  Upper-Saddle  River,  NJ,  USA,  1996. 

[3]  S.  Stein.  Fading  Channel  Issues  in  System  Engineer¬ 
ing.  IEEE  Journal  on  Selected  Areas  in  Communi¬ 
cations,  SAC-5:68-89,  Feb.  1987. 

[4]  B.  Sklar.  Rayleigh  Fading  Channels  in  Mobile  Dig¬ 
ital  Communication  Systems  Part  II:  Mitigation. 
IEEE  Communications  Magazine ,  pages  102-109, 
1997. 

[5]  L.  Cohen.  Time- Frequency  Analysis.  Prentice-Hall, 
1995. 

[6]  F.  Hlawatsch  and  G.F.  Boudreaux-Bartels.  Linear 
and  quadratic  time-frequency  signal  analysis.  IEEE 
Signal  Processing  Magazine,  9  (2):21-67,  1992. 

[7]  J.  Ville.  Theorie  et  application  de  la  notion  de  signal 
analytique.  Cables  et  Transmissions,  2A(l):61-74, 
1948. 

[8]  J.G.  Proakis.  Digital  Communications.  McGraw- 
Hill,  third  edition,  1995. 

[9]  B.  Barkat  and  B.  Boashash.  A  High-Resolution 
Quadratic  Time-Frequency  Distribution  for  Multi- 
component  Signals  Analysis.  IEEE  Trans,  on  Signal 
Processing,  2001.  (In  print). 


209 


MRC  RECEIVER  PERFORMANCE  WITH  MQAM  IN  CORRELATED  RICIAN 

FADING  CHANNELS 


Chunhua  Yang,  Guoan  Bi 

School  of  Electrical  and  Electronic  Eng., 
Nanyang  Technological  University, 
Singapore. 


A.  R.  Legman 

Digital  Comm.  Strategic  Research  Group, 
Center  for  Wireless  Communication, 

20,  Science  Park  Road,  Science  Park  II, 
Singapore. 


ABSTRACT 

Due  to  difficulties  in  deriving  the  probability  density 
function,  performance  of  MRC  diversity  receiver  in  cor¬ 
related  Rician  fading  channels  is  rarely  reported  in  the 
literature.  This  letter  shows  that  the  difficulty  can  be 
avoided  by  a  linear  transformation  technique.  General 
closed-form  expressions  of  average  symbol  error  rate  for 
various  modulation  schemes  can  be  easily  derived.  As 
an  example,  this  letter  derives  the  SER  of  MQAM  over 
correlated  Rician  fading  channels. 


1.  INTRODUCTION 

Diversity  is  an  effective  technique  to  combat  the  detri¬ 
mental  effects  of  multipath  fading.  Previous  work  on 
performance  analysis  of  diversity  reception  mainly  fo¬ 
cused  on  the  case  of  independent  fading  with  binary 
modulation  schemes.  In  [1]  the  performance  of  an  L- 
branch  equal  gain  combiner  on  independent  Rician  fad¬ 
ing  channels  was  derived.  In  [3],  the  average  bit  error 
rate  (BER)  of  a  BPSK  system  with  MRC  on  a  gen¬ 
eral  Rician  fading  channel  was  studied.  Subsequently, 
the  average  BERs  of  M-ary  modulated  signals  for  non¬ 
diversity  reception  over  Rician  fading  were  presented  in 
[2].  The  exact  expressions  of  SER  for  multilevel  modu¬ 
lated  signals  with  MRC  over  Rician  fading  channels  are 
seldom  reported  in  the  literature  possibly  because  the 
difficulties  in  deriving  the  probability  density  function 
(PDF). 

In  this  letter,  we  show  that  the  difficulty  in  deriving 
the  PDF  can  be  avoided  by  a  linear  transformation 
technique  to  obtain  the  required  characteristic  func¬ 
tion  (CF).  The  exact  expressions  of  SER  can  be  easily 
derived  for  the  MRC  diversity  receiver  with  multilevel 
quadrature  amplitude  modulation  (MQAM)  in  the  cor¬ 
related  Rician  fading  channels.  The  method  is  simple 
and  general  enough  to  be  used  for  any  correlated  signal 
model  with  arbitrary  fading  parameters. 


2.  CHARACTERISTIC  FUNCTION 


Consider  an  L-branch  MRC  over  the  correlated  Rician 
fading  channel  and  assume  the  received  signals  from  the 
L-branch  diversity  system  in  complex  Gaussian  form  to 
be  x(t)  =  xc(f)  +  jxs(t),  where  the  real  part  xc(t)  = 
[xci ,  •  •  • ,  x-ci)  and  imaginary  part  xs(t)  =  [.rsi,  •  •  • ,  xsI] 
are  Gaussian  Random  processes  with  E\xc\  =  c  - 
[d .  •  •  • ,  ci],  E[xs]  =  0,  and  covariance  matrix  R.  The 
y  th  element  of  R  is  PijCTiVj ,  where  pij  is  the  corrcla, - 
tion  coefficient,  of  is  the  variance  of  xCi  or  x.,#. 

The  resultant  SNR.  at  the  output  of  L-branch  MRC  is 


£ 

fc=i 
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The  characteristic  function  (CF)  of  7  was  given  in  [5]: 


$0V)  -  E[e^] 


exp 
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which  is  difficult  to  be  used  in  the  performance  analysis. 
Because  the  covariance  matrix  R  is  positive  definite,  it 
can  be  diagonalized  with  an  orthonormal  matrix  Q, 
defined  as 


QRQt  =  QtRQ  =  A  (3) 


where  A  is  a  diagonal  matrix  with  elements  A*,  >  0  (k  = 
1,  -  -  - ,  L),  Afc  is  the  fcth  eigenvalue  of  the  covariance 
matrix  R,  and  Q  is  the  orthonormal  matrix,  composed 
of  the  eigenvectors  for  R. 

By  using  the  orthonormal  matrix  Q,  the  characteristic 
function  of  7  can  be  simplified  as 


$(jV)  = 


0-7803-701 1-2/01/$10.00  ©2001  IEEE  21 0 


exp 
L 

k\ 1  -  2jv^k 
=  exp 


N0 


-1 


3V  /  T  2 3V  a  \ 

~  Q  i  iV^  1  Q 


nr 

k=l 


l 


2i^fc 


exp 


3vf^k  tt  _ 


-i 


jv  H  (  2 jv  . 

To”  l1-^)  ’ 


*  1  -  2;Kfe 


(4) 


where  r/  =  Qc,  £fc  =  and  Pb  =  jfc-  It  is  seen  that 
(4)  is  the  CF  of  the  Hermitian  form  for  an  independent 
complex  Gaussian  process  y  =  Qx  with  the  mean  77 
and  covariance  matrix  A.  The  CF  of  the  output  SNR 
of  correlated  branch  signal  x  is  obtained  by  deriving 
the  CF  of  the  output  SNR  of  the  new  independent  sig¬ 
nal  y  obtained  by  multiplying  x  with  Q,  where  Q  is 
considered  to  be  the  transformation  matrix.  The  PDF 
of  the  resultant  SNR  7  can  be  easily  obtained  by  taking 
inverse  Fourier  transform  of  (4) 

The  conditional  SER  for  square  QAM  is  given  by  [4] 

Ps( 7)  =  2 Q  erfc  (y/pj)  -  q2erfc 2  (VF7)  (5) 

where  q  =  1  -  ^=,  p  =  1.5  logzjf^- 
By  using  the  expressions  [4] 


erfc  (y/lry'j  =  —  J  exp  [-67CSC2#]  d6 

JT 

erf 2  =  —  J  exp  [- bycsc28 ]  d8,  (6) 


for  MQAM  is  derived  by 
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Exchanging  the  order  of  integrals  and  using  the  defin¬ 
ition  of  (2  ),  (7)  can  be  expressed  as 
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The  SER  in  (8)  is  in  a  finite-range  integral  with  an 
integrand,  which  is  a  product  of  polynomial  and  ex¬ 
ponential  functions.  We  assume  that  omni-directional 
antennas  are  arranged  in  one-dimension.  The  distance 
between  the  adjacent  antennas  is  d  and  the  spacial  cor¬ 
relation  follows  an  exponential  function.  The  elements 
of  the  covariance  matrix  R  for  xc  or  xs  are  given  by: 


Rij  —  PijCfiO j  —  Oi<?jexp 


(9) 


where  A  is  the  carrier  wavelength  and  k  =  21.4.  Based 
on  (8),  numerical  results,  as  shown  in  Figs.l  and  2, 
are  calculated  for  different  values  of  Rician  factor  K. 
Fig.  1  compares  the  SERs  for  m  =  4, 16  and  64  with 
correlated  or  noncorrelated  fading  channels.  Figure  2 
shows  that  when  d/\  >  0.4,  the  SER  becomes  con¬ 
stant,  which  means  that  the  distance  between  adjacent 
antennas  can  be  as  small  as  0.4A. 


3.  CONCLUSION 

By  transforming  the  correlated  racian  fading  signals, 
we  can  easily  obtain  the  characteristic  function  of  the 
virtually  independent  racian  fading  signals.  This  let¬ 
ter  illustrates  the  derivation  of  the  SER  with  MQAM 
in  correlated  racian  fading  channels.  The  SERs  with 
other  multilevel  modulation  such  as  MPSK  and  MFSK 
can  be  derived  in  a  similarly  way. 
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Figure  1:  Average  BER  versus  average  input  SNR.  for 
MQAM  (M  =  4, 16, 64)  with  MRC  over  the  correlated 
Rician  fading  channels  ( L  =  3,  d/X  =  0.2,  oc,  K  =  5). 


Figure  2:  Average  BER  versus  d/X  for  MQAM  receiver 
with  MRC  over  the  correlated  Rician  fading  channels 
(M  =  16,  L  =  3,4,  K  =  0,3,5,  and  SNR/per  channel 
=  10dB) 
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ABSTRACT 

In  many  source  detection  and  localization  appli¬ 
cations,  the  number  of  receivers  is  smaller  than 
the  number  of  sensors,  where  the  receivers  are 
connected  to  the  sensors  via  a  preprocessing  de¬ 
vice.  The  preprocessing  device  can  easily  be  im¬ 
plemented  in  hardware  by  linear  transformation  of 
the  measured  signals  at  the  sensors.  In  this  paper, 
the  Cramer-Rao  lower  bound  for  this  problem  is 
developed  and  it  is  shown  that  by  judicious  choice 
of  the  preprocessing  matrix,  it  is  possible  to  re¬ 
duce  the  bound  on  direction  of  arrival  estimation 
errors.  The  results  demonstrate  the  trade-off  be¬ 
tween  azimuth  and  elevation  estimation  errors  us¬ 
ing  a  planar  array  divided  to  subarrays. 

1.  INTRODUCTION 

Traditional  array  processing  techniques  assume  that 
simultaneous  measurements  from  all  sensors  are 
available.  However,  in  many  applications,  such  as 
radar  and  satellite,  the  number  of  sensors  may  be 
large,  and  using  the  same  number  of  receivers  as 
sensors,  results  in  large  number  of  receivers  and 
A/D  converters,  which  are  expensive  specially  in 
wide-band  applications  with  high  sampling  rate. 
Furthermore,  source  detection  and  localization  al¬ 
gorithms  which  are  applied  to  data  received  by 
large  arrays  is  computationally  expensive.  There¬ 
fore,  in  practice,  it  is  desired  to  reduce  the  number 
of  receivers  to  be  lower  than  the  number  of  sensors. 
This  solution  enables  to  process  lower  amount  of 
data  without  reducing  the  antenna  aperture.  This 


approach  requires  a  transformation  of  the  received 
signal  at  the  array  to  data,  on  which  the  source 
detection  and  localization  algorithms  can  be  ap¬ 
plied.  In  [1],  the  Maximum-Likelihood  estimator 
for  source  localization  using  fewer  receivers  than 
sensors  has  been  presented.  This  approach  as¬ 
sumes  a  linear,  time-varying  transformation  unit 
as  a  preprocessing  stage.  That  is 

x(t/)  =  G(t/)y(tj)  ,  (1) 

where  x(h)  is  the  received  signal  at  the  array  at 
time  ti,  y(ti)  is  the  input  signal  to  the  processor, 
and  the  matrix  G (ti)  is  the  linear,  time-varying 
transformation  matrix. 

In  [2]  this  approach  has  been  adopted  with 
2  receivers  where  the  preprocessing  unit  contains 
two  switches,  i.e.  the  matrix  G (t/)  contains  zeros 
and  ones.  In  [1]  and  [2],  it  is  assumed  that  the 
transformation  matrix  is  given. 

Our  goal  in  this  paper  is  to  obtain  the  opti¬ 
mal  transformation  matrix  G(t/)  by  means  of  the 
Cramer-Rao  lower  bound  (CRLB).  Two  cases  are 
examined.  In  the  first,  a  two  dimensional  array 
with  time-invariant  transformation  matrix  is  con¬ 
sidered.  A  possible  application  for  this  problem 
is  for  phased-array  radar  systems  which  may  be 
divided  into  several  subarrays.  A  single  receiver 
is  assigned  for  the  output  of  each  subarray  unit. 
In  the  second  problem,  we  assume  that  multiple 
snapshots  are  available  with  time- varying  transfor¬ 
mation  matrix,  where  the  criterion  for  optimiza¬ 
tion  is  the  CRLB  on  source  direction  estimation 
error  variance. 
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2.  PROBLEM  FORMULATION 


3.  CRAMER- RAO  LOWER  BOUND 


Consider  a  far-field  source  radiating  a  narrowband 
signal,  received  by  a  plane  array  of  N  sensors.  The 
complex  envelope  of  the  vector  of  the  received  sig¬ 
nal  at  the  sensors  is  given  by: 

y;  =  a(u,v)si  +  ni,  l  =  (2) 

where  s/  is  the  complex  envelope  of  the  signal  at 
the  Zth  snapshot  and  a (u,v)  is  the  array  steer¬ 
ing  vector.  The  signal  snapshots,  (si ,  •  •  • ,  s/J,  are 
assumed  to  be  deterministic,  unknown.  The  sam¬ 
ples  of  the  noise  vector,  {n/}^.,  are  assumed  to 
be  zero-mean,  complex-Gaussian  and  i.i.d:  n;  ~ 
Nc(0,  ct^I),  where  the  noise  variance,  a is  known. 

The  source  location  parameters,  u  and  v ,  are 
given  by  u  =  sincficosO,  and  v  =  cos<$>  cos  9.  where 
4>  and  0  are  the  source  azimuth  and  elevation,  re¬ 
spectively.  The  elements  of  the  steering  vector 
are  given  by:  an(u,v)  =  e-’~^dxnU+d,Jr'v^  where  the 
vector  ( dxrndyn )  denotes  the  nth  sensor  location. 

Because  of  the  limited  number  of  receivers,  the 
measurements  y /  are  linearly  transformed  to  pro¬ 
vide  the  input  to  the  receivers  according  to  (1). 
Now  the  data  model  is 


x/  =  Gia(u,v)si  +  G/n/ 


(3) 


The  tranformed  noise  vector,  e/,  is  now  zero-mean. 
complex-Gaussian  with  covariance  matrix, 


3.1.  Case  1:  Single  Snapshot 

In  this  section,  we  first  develop  the  GRLB  for  a 
single  snapshot  case.  To  simplify  the  notation,  the 
subscript  /  is  dropped  when  considering  a  single 
snapshot  case. 

In  the  model  of  (3).  the  information  on  the  un¬ 
known  parameters  is  in  the  mean  of  the  data,  and 
therefore,  the  Fisher  Information  Matrix  (FIM)  for 
estimating  the  vector  of  unknown  real  parameters 
S'  is  given  by  [3]: 


J*  =  2  Be 


'd(a(u, v)s ) 


(5) 


and  the  CRLB  on  estimation  errors  of  tF  is  given 
by  J^1 .  The  vector  of  unknown  parameters,  'F 
includes  the  source  location  parameters  (u,v),  in 
addition  to  the  real  and  imaginary  parts  of  the 
signal.  The  bound  on  the  source  azimuth  and  el¬ 
evation.  ((f),  9)  can  be  obtained  from  J^1  by  using 
the  chain  rule. 

By  using  the  expression  for  Re  in  (4),  one  ob¬ 
tains:  Q  =  G^R-'G  =  4r G H  (GG")^  G.  For 
subarrays  preprocessing  configuration,  the  matrix 
G  can  be  written  as: 
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Re^G/G,"  .  (4) 

In  this  model,  it  is  assumed  that  the  transforma¬ 
tion  matrix  can  be  updated  at  each  snapshot.  A 
simple  implementation  of  the  preprocessing  stage 
is  by  using  subarrays,  where  the  array  is  divided 
into  groups  of  sensors  which  are  linearly  combined 
to  provide  the  input  of  each  receiver.  In  other 
words,  the  matrix  G  can  be  formed  as  a  block- 
diagonal  matrix  whose  blocks  are  row  vectors  which 
express  the  complex  weights  for  the  sensor  of  the 
subarray.  Now,  the  number  of  parameters  of  the 
preprocessing  stage  to  be  determined  is  equal  to 
the  number  of  sensors  at  the  array. 


where  w//  is  a  row  vector  which  denotes  the  weight 
vector  for  the  elements  of  the  nth  subarray.  Thus, 
the  matrix  Q  is  a  block-diagonal  matrix  whose 
mth  block  is  -W  w”‘w,»  .  Now  derivation  of  the  FIM 

<7-  W»W,„ 

from  (5)  is  straight-forward. 

3.2.  Case  2:  Multiple  Snapshots 

The  measurements  at  different  snapshots  are  in¬ 
dependent,  and  therefore,  it  can  easily  be  shown 
that  the  FIM  in  this  case  is  given  by: 

L 

J^.  =  ^  J^>(  (7) 

/=i 
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where  J>j(  is  the  FIM  given  the  llh  snapshot,  and 
"F  contain  all  the  unknown  parameters,  that  is 
( u,v )  in  addition  to  the  real  and  imaginary  parts 
of  the  unknown  signal  (sj ,  •  •  • ,  sl).  Note  that  J^, 
is  a  function  of  G/  only.  However,  the  CR.LB  is  a 
non-linear  function  of  Gi,  •  •  • ,  G l-  /' 


4.  EXAMPLES 


4.1.  Example  1:  Single  snapshot 

In  order  to  demonstrate  the  trade-off  between  the 
azimuth  and  elevation  estimation  errors  using  a 
planar  array,  the  following  example  is  presented. 
Consider  an  array  of  two  linear,  horizontal  sub- 
arrays  (along  x  axis).  The  subarrays  are  set  in 
parallel  at  different  heights,  and  the  array  consists 
of  2x2=4  elements  where  each  row/subarray  con¬ 
sists  of  2  elements.  The  horizontal  and  vertical 
distance  between  adjacent  sensors  is  d.  With  no 
loss  of  generality  we  can  impose  the  weight  of  a 
single  element  of  each  subarray  to  be  one,  and  the 
matrix  G  becomes 


1  wq  0  0 

0  0  1  W] 


(8) 


Now,  the  CRLB  on  the  source  location  parameters 
( u,v )  can  be  expressed  as  a  function  of  (wq, w\). 
Assuming  a  single  snapshot  with  known  signal,  it 
can  be  shown  that  the  FIM  for  this  problem  is 
given  by: 
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I  w\  e  A  j 
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where  A  is  the  signal  wavelength.  For  this  exam¬ 
ple,  the  distance  between  adjacent  sensors,  d,  was 
chosen  to  be  half  a  wavelength.  Fig.  1  presents 
the  bound  on  both  parameters  u  and  v  as  a  func¬ 
tion  of  £,  where  £  is  related  to  w\  via:  w\  =  A-?. 

This  figure  demonstrates  the  trade-off  between  the 
bounds  on  the  two  unknown  parameters,  u  and  v. 


Figs.  2  and  3  show  the  bounds  on  the  two  un¬ 
known  parameters  as  a  function  of  absolute  value 
and  phase  of  W]  where  wq  was  set  to  1. 

In  cases  where  the  number  of  sensors  is  larger 
than  in  the  above  example,  or  in  multiple  snap¬ 
shots  case,  the  number  of  unknown  parameters 
(non-zero  elements  of  the  matrices  G/)  may  be 
large  and  the  multidimensional  optimization  prob¬ 
lem  is  solved  numerically. 


4.2.  Example  2:  Multiple  snapshots 

Consider  an  array  of  two  linear,  horizontal  sub¬ 
arrays  (along  x  axis).  The  subarrays  are  set  in 
parallel  at  different  heights,  i.e.  the  array  con¬ 
sists  of  2x4=8  elements  where  each  row  contains 
4  elements.  The  distance  between  the  sensors  is 
half  a  wavelength  in  both  axis.  A  source  is  lo¬ 
cated  at  azimuth  <j>  =  30°  and  elevation  0  =  0°, 
that  is  (u,v)  =  (^,^p).  The  signal  source  is  as¬ 
sumed  to  be  unknown  deterministic.  Two  snap¬ 
shots  with  two  different  transformation  matrices 
were  assumed.  The  optimization  criterion  was  to 
minimize  the  azimuth  estimation  error  standard- 
deviation  (STD)  while  ignoring  the  bound  on  source 
elevation  error.  The  bound  is  minimized  with  re¬ 
spect  to  the  unknown  parameters  of  the  matrices 
G;  using  the  Genetic  Algorithm,  and  it  is  com¬ 
pared  to  the  bound  that  is  obtained  using  a  trans¬ 
formation  matrix  for  conventional  beamforming  in 
the  direction  of  the  source.  The  results  show  that 
by  optimization  of  the  preprocessing  stage,  the 
bound  on  azimuth  error  STD  can  be  reduced  from 
1.82°  to  0.47°.  This  improvement  is  achieved  in 
the  cost  of  greater  bound  on  elevation  error  STD. 
Using  this  optimization  procedure,  one  is  able  to 
control  the  accuracy  in  azimuth,  elevation  or  any 
combination  of  them. 

In  the  above  examples,  the  bounds  were  calcu¬ 
lated,  assuming  that  the  source  direction  is  known. 
Although  in  some  applications,  such  as  radar1 ,  the 
source  direction  may  be  roughly  known,  this  may 
not  be  the  case  in  many  others.  For  these  prob¬ 
lems  the  minimax  criterion  may  be  applied,  that 
is, 


1  The  illuminated  targets  are  within  the  radar  beam. 
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abs(w  ) 


(Gi ,  •  •  • ,  G i)  — 

arg  min  {maxCBLB(u0.v0)\Gu-,GL} 


where  CBLB(u0,v0)  denotes  the  evaluated  CRLB 
on  source  location  parameter,  ( u.v ).  or  any  com¬ 
bination  of  them,  where  the  source  is  located  at 
(u0,  V0). 


CRLB1/2(u,v)  -  source  azimuth:  2(f,  w0n:lw,l*1 


Phase  of  w1  [deg] 


CRLB'/2(v)  -  source  azimuth:  20°,  w =i 


Figure  3:  The  CRLB  on  v  as  a  function  of  |wi| 
and  £:  w\  —  wq  =  1. 


5.  CONCLUSIONS 

The  problem  of  determination  of  the  linear  prepro¬ 
cessing  matrix  for  source  localization  with  fewer 
receivers  than  sensors  is  addressed.  The  CRLB 


Figure  1 :  The  CRLB  on  u  and  v  as  a  function  of 

-  i—£  i 

£:  W}  =  eJ  a  wq  =  1 . 
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for  this  problem  is  developed  and  used  as  the  cri¬ 
terion  for  optimization.  For  a  planar  array  divided 
into  subarrays,  the  trade-off  between  azimuth  and 
elevation  error  STD’s  is  demonstrated.  The  prob¬ 
lem  of  determination  of  linear,  time-varying  pre¬ 
processing  stage  is  also  investigated. 


CRLB1/2(u)  -  source  azimuth:  20°.  wQ=l 


-40  -20  0  20  40  60 

Phase  of  w1  [deg] 


Figure  2:  The  CRLB  on  u  as  a  function  of  \w\  \ 

■  I  ^7T d  C  , 

and  w\  —  \W]  \eJ  a  \  wq  =  1 . 
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ABSTRACT 

We  consider  Friedlander's  array  interpolation  technique, 
whose  main  shortcoming  in  multi-source  scenarios  is  that 
it  does  not  provide  sufficient  robustness  against  sources  ar¬ 
riving  outside  specified  interpolation  sectors.  In  this  paper, 
we  propose  a  new  interpolation  approach  which  incorpo¬ 
rates  such  robustness  property.  Our  technique  minimizes 
the  interpolation  error  inside  the  sector  of  interest  while  set¬ 
ting  multiple  “stopband”  constraints  outside  this  sector  to 
prevent  performance  degradation  effects  caused  by  out-of¬ 
sector  sources.  Based  on  such  robust  approach  to  array  in¬ 
terpolation,  we  develop  convex  formulations  of  the  inter¬ 
polation  matrix  design  problem  using  Second-Order  Cone 
(SOC)  programming. 

1.  INTRODUCTION 

In  array  processing,  specific  array  structures  are  frequently 
used  to  simplify  implementations  of  subspace  direction  find¬ 
ing  methods.  For  example,  the  Uniform  Linear  Array  (ULA) 
structure  is  exploited  to  formulate  computationally  efficient 
search-free  root-MUSIC  and  MODE  algorithms  [1],  [2],  Un¬ 
iform  Circular  Arrays  (UCA’s)  and  arrays  with  translational 
invariances  also  facilitate  search-free  formulations  of  sub¬ 
space  methods,  such  as  conventional  and  multiple  invari¬ 
ance  ESPRIT  [3],  UCA  root-MUSIC  and  UCA-ESPRIT  [4], 
multiple  invariance  root-MUSIC  [5],  and  RARE  [6], 

Unfortunately,  all  these  methods  are  not  straightforwar¬ 
dly  applicable  to  arrays  with  an  arbitrary  geometry.  In  order 
to  enable  such  application,  Friedlander  developed  an  elegant 
approach  [7]  based  upon  the  idea  of  interpolating  a  virtual 
array  of  a  required  structure  (e.g.  ULA,  UCA,  etc.)  us¬ 
ing  the  original  “non- structured”  array.  The  interpolation  is 
achieved  by  means  of  minimizing  the  error  between  the  in¬ 
terpolated  and  desired  array  responses  in  some  chosen  inter¬ 
polation  sector.  Although  array  interpolation  approach  has 
several  attractive  properties  [8]-[9]  and  has  been  success¬ 
fully  applied  to  practical  problems  [10],  an  essential  short¬ 
coming  of  this  method  is  that  it  does  not  provide  sufficient 
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robustness  against  sources  which  arrive  outside  specified  in¬ 
terpolation  sectors.  In  this  paper,  we  propose  a  new  inter¬ 
polation  approach  which  has  such  robustness  property.  Our 
technique  minimizes  the  interpolation  error  inside  the  sec¬ 
tor  of  interest  under  multiple  “stopband”  constraints  outside 
this  sector  in  order  to  prevent  performance  degradation  ef¬ 
fects  caused  by  out-of-sector  sources.  Based  on  this  robust 
array  interpolation  approach,  we  develop  convex  formula¬ 
tions  of  the  interpolation  matrix  design  problem  using  SOC 
programming. 

2.  THE  CONVENTIONAL  ARRAY 
INTERPOLATION  APPROACH 

The  key  idea  of  the  conventional  array  interpolation  ap¬ 
proach  is  to  interpolate  virtual  array  observations  inside  a 
preliminary  specified  angular  sector  0  =  [0min ,  0max]  us¬ 
ing  real  array  data.  The  nxn  interpolation  matrix  B  should 
obey 

BH a(9)  ~  a{9) ,  9  e  0 

where  a(9)  and  a{9)  are  the  n  x  1  steering  vectors  of  the 
real  and  virtual  arrays,  respectively.  Here,  9  is  the  angle, 
n  is  the  number  of  sensors,  and  {■)H  denotes  the  Hermi- 
tian  transpose.  Dividing  the  sector  0  into  M  -  1  uniform 
subintervals  of  the  width  <5  and  defining  the  n  x  m  matrices 

C  =  [o(#mjn),  a(#min  +  S),  .  .  .  ,  tl(#max  —  5),  fl(0max)]  > 

C  =  [a(0mjn),  4($min  +  3),  ■  ■  ■  ,  4(#max  —  ^)>o($max)] 

the  interpolation  matrix  B  can  be  computed  as  a  least  squ¬ 
ares  solution  to 

bhc  =  c 

In  its  simplest  form,  this  solution  can  be  written  as  [7] 

B  =  {CCny'CC*1  (1) 

After  noise  prewhitening,  the  interpolated  (virtual)  array  sn¬ 
apshots  can  be  computed  as 

y(i)  =  (BHB)-1^BHx(i),  i  =  l,...,N  (2) 
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where  x(i),  i  =  1, . . .  ,N  are  the  real  array  observations, 
and  N  is  the  number  of  snapshots. 

The  application  of  any  direction  finding  estimator  to  (2) 
and  the  generalization  of  this  approach  to  the  case  of  multi¬ 
ple  angular  interpolation  sectors  are  straightforward  [7], 

The  main  shortcoming  of  this  approach  is  that  in  multi¬ 
source  scenarios,  it  does  not  provide  sufficient  robustness 
against  sources  impinging  on  the  array  outside  0.  The  pres¬ 
ence  of  such  (sometimes  quite  powerful)  sources  may  lead 
to  a  performance  breakdown  of  the  direction  finding  tech¬ 
nique  applied  to  interpolated  array  observations. 

3.  THE  ROBUST  ARRAY  INTERPOLATION 
APPROACH 

To  incorporate  robustness  against  out-of  sector  sources  in 
the  array  interpolation  approach,  we  reformulate  the  inter¬ 
polation  matrix  design  problem  as  the  quadratic  minimiza¬ 
tion  problem  with  multiple  inequality  constraints 

min  ||BHC-C|| 

B 

subject  to  \\BHa(6k)\\  <  e ,  (3) 

0*  G0,  k=  1,2,...  ,K 

where  0  combines  all  directions  lying  outside  the  sector  0, 
e  >  0  is  the  parameter  characterizing  the  “stopband”  (out- 
of-sector)  attenuation,  and  I\  is  the  number  of  inequality 
constraints. 

Alternatively,  another  formulation  can  be  used 

min  max  || Bu a{Bm)  —  a(0m)|| , 

B  m 

8m  €  0  ,  m  =  1,2, . .  .M 

subject  to  \\BH a(Qk)\\  <  e ,  (4) 

h  e0,  fc  =  1,2, . . .  ,I\ 

where  the  minimax  criterion  is  employed. 

4.  CONVEX  FORMULATIONS  USING 
SECOND-ORDER  CONE  PROGRAMS 

In  this  section,  we  present  SOC  formulations  of  the  prob¬ 
lems  (3)  and  (4).  Note  that  an  efficient  MATLAB  toolbox 
is  available  that  solves  such  problems  in  a  computationally 
efficient  way  [11]. 

The  general  form  of  SOC  program  is  given  by 
max  dT  e 

e 

subject  to  cq  —  F^e  €  SOCc  q ,  q  =  1, . .  •  ,p 


Here,  all  vectors  and  matrices  are  real-valued,  e  is  a  vec¬ 
tor  containing  the  design  variables,  p  is  the  number  of  SOC 
constraints  and  Cq  -  1  is  the  dimension  of  the  ryth  SOC  de¬ 
fined  as 

SOCc-  =  {(2i,z)  £  M  x  >  ||z||} 

where 

z  =  [zuz  J 

=  ci i  ~  F  q  e  , 

z  =  [z2, .  ■  ■  ,  zCq]T 

Let  us  introduce  the  following  notations 

b  =  vec  j-B,/ 1  , 

c  =  vec  {C}  , 

c  =  vec  |  C  | 

Here,  vec{  }  denotes  the  vectorization  operator,  stacking 
the  columns  of  a  matrix  on  top  of  each  other.  The  following 
property  for  arbitrary  matrices  X ,  Y  and  Z  of  conformable 
dimensions  will  be  used  frequently  throughout  the  text 

vec  {XYZ}  =  (zT  <8  x)  vec  {V}  (5) 

where  ©  denotes  the  Kronecker  matrix  product. 

Note  that 

11*11  =  l|vec{X}||  (6) 

Making  use  of  (5)  we  obtain  that 

vec{j3"C-c}  =  vec  {s"c}-c 
=  vec  j/B,fc}-c 
=  (CT  ©  l)  b  -  c  (7) 


where  I  is  the  identity  matrix.  Similarly, 


BHa{9k) 

=  vec{lBHa(0A.)} 

4.1.  Problem  (3) 

=  {aT(ek)®l)b 

(8) 

Using  (6)-(8),  we  can  reformulate  (3)  as 
min  ||  ( CT  ©  I)b  —  c  || 

b 

subject  to  ||  ( aT(dk )  ©  I)b  ||  <  e ,  (9) 
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6ke6,  k  =  l,2,...,I< 

Note  that  in  (9),  the  vectors  and  matrices  b,  C,  c  and  a(6k) 
are  complex  valued.  Apparently,  the  problem  (9)  can  be 
replaced  by  the  following  equivalent  optimization  problem 

max  t 

subject  to  ||  (CT®  I)b  -  c  ||  <  -r , 

||  (aT(0k)  ®I)b  ||  <  e,  (10) 
0k£Q,  k  =  l,2,...,K 


Q  -  Lke  e  SOC2n+1 , 
ek£Q,  k  =  l,2,...,K 

which  is  the  SOC  formulation  of  the  optimization  problem 
(3). 

4.2.  Problem  (4) 

Using  (5),  rewrite  the  optimization  problem  (4)  as 


The  problem  (10)  can  be  reformulated  in  terms  of  real- valu¬ 
ed  variables.  Let  (-)r  and  (•),  hereafter  denote  the  real  and 
the  imaginary  parts  of  a  matrix,  respectively.  Then  (10)  be¬ 
comes 


max  t 
subject  to 


[((CT®  I)b)r' 

■(c)rl 

[((CT®I)b)i 

.(£)J 

<  -T, 


max  r 

subject  to  ||  (aT(0m)  ®I)b-a(6m)  ||  <  -r,  (15) 
||  ( ar(9k)®I)b  ||  <  e, 

0  m  €  0  ,  m  =  1,2, ...  ,M , 

0k£&,  k  =  l,2,...,K 

Defining  the  matrix 


Ok  €  ©, 


r((aT(^)  ® /)  b)r 
U(«T(0fc)  ®I)  b)._ 
k  —  1,2,...  ,  K 


<  e, 

1  0T  0T 

— 

0  (aT(0m)®l)r  -{aT{em)®l). 

0  {aT{6m)®l)i  {aT{6m)®l)r 

or,  equivalently,  maximize  r  subject  to 


r(CT®I)r  -(CT® 

[Wrl 

'(c)  ri 

[(CT®J)i  (CT®  I)r  _ 

im 

ml 

f  (aT{0k)®l)  ~(aT(6k)®l)  1 

'(6)rl 

[(aT(6k)  ®  I).  (aT(9k)  (g>  I)t 

J 

im 

<  ~T, 

<  e, 


0k£&,  k  =  l,2,...,K 
Defining  the  (2 n2  +  1)  x  1  vectors 

d  =  [1,0,...  ,0]T  , 


e  =  T,(bT)r,(b1 


(ID 

(12) 


and  the  matrices 

'  1 
I 

M  = 


Lk  = 


subject  to 


0T 

0T 

(CT®J),  - 

K®1), 

5 

(13) 

(cT®l)j  ( 

0 

® 

3 

_0T 

0T 

1 

aT(0fc)  ®  I)r 

-{aT(0k)®l). 

(14) 

[aT(6k)  ®  I). 

{aT(0k)  ®  I)r  _ 

as 

max  dTe 

e 

f  ?  1  -  Me 

e  soc2n'i+1 , 

and  using  (11),  (12)  and  (14),  we  can  reformulate  (15)  as 
the  following  SOC  program 

max  dTe 


subject  to 


0 

d{0m) 


-  Mme  e  SOC2n+1 , 


q  -  Lke  €  SOC2n+1  , 

0m  G  ©,  m  =  1,2, ...  , M, 
0kG&,  k  =  1,2, . . .  ,  K 


5.  SIMULATIONS 

In  our  simulations,  the  conventional  and  robust  interpolated 
array  approaches  are  compared  in  terms  of  DOA  estimation 
Root-Mean-Square  Errors  (RMSE’s)  in  the  presence  of  an 
interfering  out-of-sector  source.  We  assume  a  linear  array  of 
ten  sensors,  N  =  100,  and  two  uncorrelated  sources.  One 
of  the  sources  with  the  SNR  =  0  dB  is  assumed  to  be  the 
signal  of  interest  whose  DOA  belongs  to  the  interpolation 
sector©  =  [—15°,  15°]  and  the  second  one  is  the  interfering 
(out-of-sector)  source.  In  each  simulation  run,  the  DOA’s  of 
the  signal  and  interfering  sources  are  drawn  uniformly  from 
the  intervals  [-15°,  15°]  and  [-90°,  -25°]  U  [25°,  90°],  re¬ 
spectively.  Furthermore,  the  sensor  coordinates  of  the  real 
array  are  drawn  uniformly  in  each  run  from  the  interval 
[0, 4.5A],  where  A  is  the  wavelength  and  the  coordinates 
of  the  leftmost  and  rightmost  array  sensors  are  fixed  and 


219 


7.  REFERENCES 


-0-  INTERPOLATED  ROOT-MUSIC  (CONVENTIONAL  APPROACH  (1)) 
-*  •  INTERPOLATED  ROOT-MUSIC  (ROBUST  APPROACH  (3)) 

-O-  INTERPOLATED  ROOT-MUSIC  (ROBUST  APPROACH  (4)) _ 
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Fig.  1.  DOA  estimation  RMSE's  versus  the  ISR. 


equal  to  0  and  4.5A,  respectively.  A  virtual  ULA  with  the 
half-wavelength  spacing  is  interpolated  using  the  conven¬ 
tional  interpolation  technique  (1)  and  the  robust  approaches 

(3)-(4),  respectively.  In  all  these  methods,  the  parameters 
M  =  K  =  12  are  chosen  and  a  nonuniform  grid  is  used  for 
0  based  on  the  output  of  the  conventional  beamformer. 

The  SeDuMi  toolbox  has  been  used  to  solve  the  corre¬ 
sponding  SOC  problems.  Diagonal  loading  is  used  in  the 
prewhitening  step  in  order  to  guarantee  stable  inverse  of 
the  matrix  BH  B  in  (2).  The  interpolated  root-MUSIC  [7] 
is  used  to  estimate  the  signal  DOA.  In  total,  100  indepen¬ 
dent  simulation  runs  are  performed  to  estimate  the  RMSE's 
which  are  displayed  in  Fig.  1  versus  the  Interference-to- 
Signal  Ratio  (ISR).  This  figure  validates  essential  perfor¬ 
mance  improvements  provided  by  the  proposed  robust  ap¬ 
proach. 


6.  CONCLUSIONS 

A  new  robust  approach  to  array  interpolation  has  been  pro¬ 
posed.  Our  technique  minimizes  the  interpolation  error  in¬ 
side  the  sectors  of  interest  while  setting  multiple  “stopband” 
constraints  outside  these  sectors  to  prevent  performance  de¬ 
gradation  effects  caused  by  out-of-sector  sources.  Convex 
formulations  of  the  interpolation  matrix  design  problem  ha¬ 
ve  been  proposed  using  second-order  cone  programming. 
Simulation  results  validate  robustness  of  the  proposed  tech¬ 
nique  and  demonstrate  essential  performance  improvements 
of  our  approach  relative  to  the  conventional  array  interpola¬ 
tion  method. 
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ABSTRACT 

The  problem  of  estimating  the  bearing  of  a  single,  far-held, 
non  constant  modulus  source,  surrounded  by  local  scatter¬ 
ed  using  passive  sensor  array  measurements  is  addressed. 
An  associated  source  bearing  estimation  problem  is  formu¬ 
lated,  and  the  Cramer-Rao  lower  bound  is  evaluated.  Es¬ 
timation  procedures  that  assume  the  source  is  of  constant 
modulus  and  therefore  suffer  from  mis-modeling  errors  are 
investigated.  Specifically,  the  performance  of  the  applica¬ 
tion  of  the  maximum  likelihood  (ML)  estimator  designed 
for  a  constant  modulus  source  performing  on  a  non  constant 
modulus  source  is  evaluated,  and  the  exact  degradation  in 
performance  is  quantified  as  a  function  of  the  source’s  em¬ 
pirical  variance. 


1.  INTRODUCTION 

Recently,  bearing  estimation  for  a  so-called  distributed  or 
scattered  source  has  begun  to  attract  interest  in  the  litera¬ 
ture.  A  distributed  source  may  arise  due  to  the  multipath 
scattering  effects  created  by  the  presence  of  local  scatter¬ 
ed  about  the  emitter.  The  spatial  extent  of  a  distributed 
source  is  typically  characterized  by  some  type  of  parametric 
model.  These  models  have  formed  the  basis  of  a  variety  of 
recently  reported  bearing  estimation  techniques  and  perfor¬ 
mance  studies,  e.g.,  [1],  [6]. 

One  of  the  key  assumptions  found  in  much  of  the  pre¬ 
viously  published  work  on  distributed  source  bearing  esti¬ 
mation  is  that  the  transmitted  signal  is  deterministic,  un¬ 
known  and  of  constant  modulus  (CM).  This  assumption  is 
particularly  useful  for  the  commonly  cited  complex  nor¬ 
mal  (Rayleigh  amplitude),  temporally  uncorrelated  vector 
channel  formed  between  the  source  and  the  receiving  array. 
However,  despite  its  convenience,  the  CM  signal  assump¬ 
tion  may  be  inappropriate  in  many  applications.  The  present 
work  examines  the  performance  of  estimation  procedures 
designed  to  perform  under  the  CM  assumption  when  such 
an  assumption  is  untrue. 


1.1.  Problem  Formulation 

Assuming  that  the  sampling  interval  is  significantly  greater 
than  the  channel  coherence  time,  the  received  array  mea¬ 
surement  data  for  K  snapshots  may  be  described  as  a  se¬ 
quence  of  zero  mean  uncorrelated  complex  Gaussian  ran¬ 
dom  vectors  with  time  varying  covariance  [2]: 

yf -CA^fLRJ 

R„  =P,R„(C)  +  a2I  t  =  l,...,K  (1) 

where  Rf,  is  the  channel  covariance  matrix  depending  on  the 
unknown  spatial  parameters,  C  (typically  including  mean 
angle  parameter,  9o,  corresponding  to  the  source  bearing). 
a2,  is  the  noise  variance,  and  p  =  {Pi, ,  PKf  is  the  vec¬ 
tor  of  instantaneous  source  powers.  If  the  source  is  assumed 
to  be  CM,  then 

yt  ~  CM  (0,  R„)  R,  =  PR(,(C)  +  ct2I.  (2) 

The  overall  unknown  parameter  vector  reduces  to: 

ip  =  [CT,P,cr2]  •  Models  of  this  latter  type  are  very 

common  in  the  literature,  e.g..  [1],  [6]. 

In  the  sequel  we  study  estimation  procedures  designed 
under  the  assumption  that  the  source  is  CM  when  in  fact 
the  source  is  non  constant  modulus  (NCM).  Such  model 
mis-match  may  arise  due  to  modeling  error,  or  alternatively, 
even  if  it  is  known  that  the  source  is  NCM,  it  may  be  tempt¬ 
ing  to  use  a  CM  based  algorithm  such  that  the  number  of 
parameters  to  estimate  is  small. 

The  paper  is  organized  as  follows:  Section  2  examines 
the  limitations  of  the  estimation  problem  for  the  general 
model  of  an  NCM  source  (1).  Then,  Section  3  analyzes  the 
performance  of  CM  based  algorithms  performing  on  NCM 
sources.  Lastly,  Section  4  gives  some  simulation  results 
and,  Section  5  summarizes  the  paper. 
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2.  THE  CRAMER  RAO  BOUND 


2.1.  General  model 


The  Cramer  Rao  bound.  C,  for  the  general  model  in  ( 1 )  with 

T 

parameter  vector  xp  =  [(/'  .  a'2,  pT]  may  be  written  as: 


C  =  J-1;  J  =  EJ(°; 


i=l 


(3) 


where  J  is  known  as  the  Fisher  Information  Matrix  (FIM), 
and  Tr(-)  and  [•],-,  denote  the  trace  operator  and  the  zj'th 
element  of  a  matrix,  respectively.  The  FIM  may  be  written 
in  block  form: 


J 

Jt7P 

Jpp 


J  T)T)  J»)p 

JpT)  Jpp 
[JtjPi  i  '  '  '  jJtjP;,  ] 


V  =  [<VT 


diag  [JPx  p, ,  •  •  ■ ,  JpK  ph  ] 


(4) 


where  the  notation  diag[-]  denotes  a  diagonal  matrix  of  spec¬ 
ified  diagonal  elements.  Note  that  J  pp  is  a  diagonal  matrix 
since  the  array  measurements  are  assumed  to  be  statistically 
independent.  This  allows  the  CRB  for  the  spatial  parame¬ 
ters  to  be  expressed  as  a  matrix  whose  dimension  does  not 
increase  with  the  number  of  measurements: 


For  the  case  of  CM  signals,  the  sum  in  (5)  simplifies  to: 

1  ;  ;T  ,  K_:  tT 

2-,t  =  1  Jp(P(  JpPtJr/P,  “T  JppJriPJrjP- 

It  should  be  stressed  that  since  the  number  of  unknown 
parameters  increases  with  the  number  of  samples,  the  CRB 
is  not  necessarily  attained  by  any  estimator  and  may  serve 
only  as  a  lower  bound  (see  e.g.,  [3]).  In  this  sense,  this 
problem  is  similar  to  the  well  know  deterministic  unknown 
point  source  bearing  estimation  problem  [5]. 


JCp  ~ "  UcPl  1  •  •  •  i  i(Pl< ]  !  Jp(  Pi  p'2  ’ 

jc,P,=i[icl,  =  Tr(R4-'^).  (7, 

The  CRB  can  now  be  written  as: 

r  _  [  Ccc  CCp  ' 

[  cpC  cpp  J 

c«  4  (J«  -  <8> 

where  it  is  noted  that  the  CRB  for  the  spatial  parameters  in 
the  noiseless  case  is  independent  of  the  sequence  {Pi}fL j 
which  would  be  expected  intuitively.  Furthermore,  the  CM 
bound  at  infinite  SNR  is  of  exactly  the  same  form  as  (8). 
This  property  is  intuitively  expected  since  £  is  a  vector  of 
spatial  parameters,  and  their  estimation  is  not  affected  by 
measurement  scaling  parameters  when  there  exists  no  addi¬ 
tive  noise.  This  means  that  for  infinite  SNR,  the  CRB  does 
not  depend  on  the  sequence  {Pt}J'=l. 

3.  CONSTANT  MODULUS  ALGORITHMS 

Consider  array  measurement  data  arising  from  an  NCM  source 
as  described  by  ( 1 ).  We  investigate  the  performance  of  al¬ 
gorithms  designed  to  estimate  the  bearing  of  a  CM  source 
when  applied  to  NCM  data.  Such  a  scenario  may  arise  if, 
for  example,  the  source  is  believed  to  be  CM  when  in  reality 
it  is  NCM.  Alternatively,  even  if  it  is  known  that  the  source 
is  NCM,  it  may  be  tempting  to  use  a  CM  based  algorithm 
since,  under  the  CM  model,  the  number  of  nuisance  param¬ 
eters  does  not  increase  with  the  number  of  measurements. 

3.1.  Consistency 

Assume  that  the  empirical  first  and  second  order  moment  of 
the  instantaneous  powers  are  finite: 


.41) 

.42) 


1 

0  <  F  =  lim  — 


K  -300  K 


<  00 


(=1 

K 


_  1  “ 

0  <  P 2  =  lim  —  Y  Pp  <  oo. 

A’— 300  A  ^ 

t=l 


2.2.  Infinite  SNR 

Some  simplification  is  possible  even  in  a  general  model  for 
infinite  SNR,  i.e.,  when  there  exists  no  additive  noise.  It  is 
seen  that: 


Clearly  this  means  that  the  empirical  variance  is  also  finite: 


0  <  A  P- 


lim 

A'— >oo 


!£(p'-p)2  =  p2-p2 


<  00. 


Consider  the  class  of  covariance  matching  methods  for 
bearing  estimation.  Examples  include  weighted  least  squares 
covariance  matching  and  maximum  likelihood  (ML)  esti¬ 
mation.  Under  mild  conditions  if  the  sample  covariance 
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matrix  Ry  =  ^  J2tLi  YtYt1  is  a  consistent  estimate  of 
Ri,  =  PR&  +  cr2I,  then  the  resulting  £  is  a  consistent  esti¬ 
mate  of  the  spatial  parameters,  C  [4].  In  order  to  show  the 
consistency  of  the  sample  covariance  matrix,  it  suffices  to 
show  that  for  all  row  and  column  indices  i  and  j: 


lim  E 

K—too 


K 


K 


£y?(y?r 


t=i 


=  o 


(9) 


where  y\  is  the  ith  element  of  the  vector  y4  at  an  instant  t 
and  (•)*  denotes  the  complex  conjugate. 

Under  assumptions  A1  and  A2,  It  is  easily  shown  that: 


=  11  (*»-*»))■  <12> 


As  expected,  (12)  shows  the  estimate  is  asymptotically  un¬ 
biased.  Also  observe  that  matrix  Q  is  simply  the  CM  FIM. 
The  asymptotic  covariance  of  the  estimates  is  given  by  (full 
evaluation  in  Appendix  B  in  [2]): 


lim  E 

K—y  oo 


^£y<(y*) 

t= 1 


R</]y 


33 


t= 1 


k  |R»]“  [R»]«  +  F lp!  [Rll« [Rtl- 


=  0. 


n  i  vjjj 

(10) 


Thus,  Rj,  is  a  mean  square  error  (MSE)  consistent  estimate 
of  Ry.  Note  that  jr  [Ry]  u  [Ry];/  depends  only  on  P  and 
not  on  p.  The  fact  that  the  source  is  CM  or  NCM  does  not 
affect  this  term.  The  term  AP 2  [R,fc]Ii  [Rj]^.  is  proportional 
to  the  empirical  variance  of  the  source.  It  is  equal  to  zero 
when  the  source  is  CM  and  increases  as  the  source’s  empir¬ 
ical  variance  increases.  Hence,  as  AP2  increases,  a  larger 
observation  time,  K,  is  needed  for  the  empirical  correlation 
matrix  to  get  “closer”  to  Ry. 


Cee  =  E  ( ££T )  «Q  ‘E  (vvt)  Q  1 

=  ^C+~AP^CKC.  (13) 

(  dRy1  dR-1^ 

=  Tr(R^R^j  <14> 

where  C  =  Q_1  is  the  single  snapshot  CM  CRB.  The 
asymptotic  performance  consists  of  two  terms.  The  first  is 
the  CM  CRB.  The  second  (which  represents  the  degrada¬ 
tion  with  respect  to  the  CM  CRB)  is  Unear  in  the  empirical 
variance  of  the  instantaneous  source  powers  {Pt}fLlt  AP'2. 
Note  that  the  dependence  of  CKC  on  p  manifests  itself  ex¬ 
clusively  in  terms  of  the  mean  power  P.  Hence,  for  a  given 
average  source  power,  the  performance  of  the  ML  estimate 
deteriorates  Unearly  with  the  empirical  variance  of  the  in¬ 
stantaneous  source  power.  In  other  words,  performance  de¬ 
teriorates  as  Pt  becomes  “less  constant”. 


3.2.  Small  Error  Performance  Analysis 

Attention  is  focused  on  the  CM  ML  estimator  due  to  its  op- 
timahty  under  the  CM  model  (2).  This  estimator  assumes 
Pt  =  P  Vi,  i.e.,  it  assumes  the  unknown  parameters  are: 
ip  =  [CT,  o'2,  P}-  The  ML  estimate  for  xp  is  then  given  as 
the  solution  of  the  following  estimating  equations: 

-Tl  lhp~ )  +  Tr  (R!/ 1  lRy)  =  0  Vz- 

(11) 

The  asymptotic  behavior  of  the  estimates  can  be  determined 
by  a  first  order  expansion  of  the  estimating  equations  (11) 
about  the  true  parameter  vector.  Once  again  it  is  stressed 
that  such  an  analysis  yields  the  behavior  of  CM  ML  esti¬ 
mates  when  the  data  is  NCM.  The  first  order  expansion  is 
detailed  in  Appendix  A  in  [2]  and  yields  the  following  ap¬ 
proximation  for  the  estimation  error: 

e  =  xp  —  xp  sv  Q_1v 


3.2.1.  Infinite  SNR 

For  infinite  SNR,  it  is  seen  that  K  =  =?Q  such  that  the 
spatial  parameters  estimate's  MSE  can  be  expressed  as: 


1=,L  AP2 

KC(1  +  W 


(15) 


Thus  the  deterioration  in  performance  with  respect  to  the 
CRB  is  proportional  to  the  empirical  variance  normafized 
by  the  empirical  mean  squared.  Indeed,  the  empirical  dis¬ 
tribution  of  some  sources  may  cause  serious  degradation  in 
performance,  especially  those  distributions  with  slowly  de¬ 
caying  tails.  Consider,  for  instance,  a  sequence  {Pt}tL\ 
which  behaves  as  if  it  were  a  realization  of  independent 
identically  distributed  (IID)  random  variables  governed  by 
the  Pareto  distribution.  For  this  distribution,  when  the  vari¬ 


ance  is  finite, 
term  becomes  arbitrarily  large  as  a  approaches  two. 


a(a1_2)  where  a  >  2.  Clearly  this 
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4.  SIMULATIONS 

To  compare  the  derived  analytical  results  to  empirical  ones 
we  follow  the  uniform  linear  array  model  proposed  in  [1], 
Simulations  of  1000  Monte  Carlo  runs  were  carried  each 
consisting  of  100  snapshots.  Figures  1  and  2  depict  simula¬ 
tion  results  of  the  specific  channel  model  given  in  [1]  in  two 
different  scenarios.  The  first  scenario  assumes  there  exists 
no  additive  noise  while  the  second  does  not  make  such  an 
assumption.  In  both  cases,  half  of  the  instantaneous  source 
powers,  {Pt  are  taken  to  be  equal  to  five  and  half  to  one, 
i.e.,  [Pi ,  •  •  ■ ,  Pk]t  =  [1, 5, . . . ,  1, 5]r.  The  spatial  parame¬ 
ters,  £,  include  the  bearing  which  is  set  to  90  =  arcsin  (£) 
(or  by  the  alternative  parameterization  in  [1],  u>  =  1). 


Noioeles  ML  DOA  estimates  <e)  for  NCM  sources 


Figure  1:  CM  ML  DOA  estimates  (9)  versus  the  num¬ 
ber  of  sensors  for  NCM  source  with  no  additive  noise 
[Pi,  -  ■  ■ ,  Pk]t  =  [1,5,...,  1, 5]T,  6q  =  arcsin(i). 

Figure  1  depicts  the  performance  of  the  CM  ML  for 
the  bearing,  80,  versus  the  number  of  sensors  for  the  noise¬ 
less  scenario.  As  a  reference  the  NCM  CRB  is  also  shown. 
Figure  2  depicts  the  scenario  that  includes  the  additive  noise. 
The  figure  shows  the  performance  of  the  CM  ML  estima¬ 
tor  for  the  bearing  6 o  versus  the  SNR.  In  these  simulations, 
the  number  of  sensors  was  taken  to  be  4.  Once  again,  the 
CRB  is  shown  for  reference.  For  both  figures,  it  is  seen  that 
theoretical  results  fit  well  the  empirical  results  and  that  the 
degradation  in  performance  compared  to  the  CRB  is  rela¬ 
tively  small. 


Noisy  ML  estimates  (6)  for  NCM  sources 


Figure  2:  CM  ML  DOA  estimates  ( 8 )  versus  the  SNR 
for  NCM  source,  p  =  [1, 5, . . . ,  1, 5],  M  =  4,  8q  = 
arcsin  ( ^ ) . 

ance  matching  methods  which  assume  the  source  is  of  CM 
is  proved.  Focus  is  then  set  on  CM  based  ML  estimates  and 
their  performance  is  evaluated  and  shown  to  degrade  lin- 
earily  with  the  source's  empirical  variance  Finally,  theoreti¬ 
cal  results  are  shown  to  fit  empirical  results  via  simulations. 
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ABSTRACT 

This  paper  presents  a  novel  algorithm  for  computing  the  eigenvec¬ 
tor  associated  with  either  the  largest  or  the  smallest  eigenvalue  of 
a  complex  Hermitian  matrix.  Necessary  and  sufficient  conditions 
for  convergence  are  proved,  and  simulations  show  the  superior  per¬ 
formance  over  traditional  methods. 

1.  INTRODUCTION 

Classical  Direction  of  Arrival  (DOA)  and  frequency  estimation  al¬ 
gorithms  [10,  1]  require  the  computation  and  subsequent  track¬ 
ing  of  the  eigenvector  associated  with  the  smallest  eigenvalue  of 
a  Hermitian  matrix,  henceforth  referred  to  as  a  minimal  eigenvec¬ 
tor.  Based  on  the  novel  optimisation  algorithm  developed  in  [6,  7] 
for  optimising  a  cost  function  on  the  complex  Grassmann  mani¬ 
fold,  this  paper  derives  a  new  algorithm  for  computing  a  minimal 
eigenvector  of  a  matrix.  It  is  proved  that  the  algorithm  converges  to 
a  minimal  eigenvector  provided  the  initial  vector  is  not  orthogonal 
to  the  eigenspace  spanned  by  the  minimal  eigenvectors. 

The  algorithm  differs  from  traditional  ones,  such  as  the  Power 
and  inverse  iteration  methods  [3],  in  two  important  ways.  Firstly, 
whereas  traditional  methods  can  fail  to  converge  in  a  reasonable 
number  of  iterations  if  two  or  more  eigenvalues  are  closely  spaced, 
the  proposed  algorithm  continues  to  converge  rapidly  in  such  sit¬ 
uations.  The  second  difference  is  that,  unlike  traditional  methods 
which  converge  to  the  eigenvector  associated  with  the  eigenvalue 
having  the  smallest  or  the  largest  absolute  value,  the  proposed  al¬ 
gorithm  converges  to  the  eigenvector  associated  with  the  eigen¬ 
value  having  the  smallest  or  the  largest  value.  (Recall  that  a  Her¬ 
mitian  matrix  has  real-valued  eigenvalues.) 

Notation:  The  superscripts  T  and  H  denote  transpose  and  Her¬ 
mitian  transpose  respectively.  Throughout,  the  Frobenius  norm 
||X||2  =  tr  \XH  X }  is  used,  where  tr  {•}  is  the  trace  operator. 
The  symbol  I  denotes  the  identity  matrix  whose  size  can  be  deter¬ 
mined  from  its  context. 

2.  COMPUTING  AN  EXTREME  EIGENVECTOR 

It  is  well  known  [3]  that,  for  a  symmetric  matrix  A  e  Cnxn, 

f{x)  =  ^tr{xH  Ax}  (1) 
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achieves  its  minimum,  subject  to  x11  x  =  1,  when  x  £  C"  corre¬ 
sponds  to  a  minimal  eigenvector,  that  is,  an  eigenvector  associated 
with  the  smallest  eigenvalue  of  A.  This  section  specialises  the 
steepest  descent  on  the  complex  Grassmann  manifold  algorithm 
derived  in  [6,  7]  to  this  particular  cost  function.  (Note  that  by  re¬ 
placing  A  with  —A,  the  same  algorithm  can  be  used  to  find  an 
eigenvector  associated  with  the  largest  eigenvalue  of  A.) 

An  attractive  feature  of  this  specialisation  is  that  the  optimal 
step  size  can  be  calculated  at  each  iteration.  Although  steepest 
descent  algorithms  were  derived  in  [2, 4]  for  computing  a  minimal 
eigenvector,  none  of  the  algorithms  incorporated  an  optimal  step 
size  selection  rule. 

The  key  idea  behind  the  algorithm  is  to  rewrite  the  constrained 
optimisation  problem  as  an  unconstrained  one  on  a  complex  Grass¬ 
mann  manifold.  In  general,  the  (n,p)  complex  Grassmann  mani¬ 
fold  is  the  collection  of  all  p-dimensional  subspaces  of  C”  .  Since 
a:  is  a  vector,  the  relevant  manifold  is  the  (n,  1)  complex  Grass¬ 
mann  manifold,  also  known  as  complex  projective  space  [5,  8],  It 
is  standard  to  represent  a  point  in  complex  projective  space  by  a 
vector  x  where  x  is  constrained  to  the  unit  ball,  that  is,  x11  x  =  1. 
Although  both  x  and  —x  correspond  to  the  same  point  on  complex 
projective  space,  the  cost  function  (1)  is  such  that  f(x)  =  f{—x). 
Thus,  by  treating  a:  as  a  point  in  complex  projective  space,  a  min¬ 
imal  eigenvector  of  A  can  be  found  by  minimising  f(x)  on  com¬ 
plex  projective  space. 

As  with  all  descent  type  algorithms,  given  a  point  x,  the  aim 
is  to  compute  a  descent  direction  z  and  a  step  size  7  such  that 
f(x  +  7 z)  <  f(x).  Steepest  descent  algorithms  in  Euclidean 
space  choose  z  to  be  the  negative  of  the  gradient  of  /.  As  shown 
in  [7],  this  concept  can  be  extended  to  the  complex  Grassmann 
manifold.  Specifically,  the  steepest  descent  direction  z  of  the  cost 
function  (1),  when  treated  as  a  function  on  complex  projective 
space,  can  be  shown  to  be 

z  =  —  (I  —  xxH)Ax  —  —Ax  +  (xH  Ax)x  (2) 
provided  xH  x  =  1. 

Having  derived  the  steepest  descent  direction,  all  that  remains 
is  to  determine  the  step  size  7.  It  is  expedient  though  to  first  state 
the  whole  algorithm  and  then  explain  how  the  formula  for  comput¬ 
ing  7  was  derived. 

Algorithm  1  (Minimal  Eigenvector)  Let  A  6  Crixn  be  an  ar¬ 
bitrary  Hemitian  matrix.  The  following  algorithm  converges  to 
a  minimal  eigenvector  of  A  with  probability  one  (see  Theorem  2 
below). 
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1.  Randomly  choose  an  x  €  C'!  with  unit  norm  (xH  x  =  1). 

2.  Compute  the  descent  direction  z  :=  Ax  —  Ax  where  A  :  = 
xH  Ax.  If  V zH z  is  sufficiently  small,  then  stop. 

3.  Compute  a  xH  A2  x-X2  and  ft  :=  xH  A*x— 3oA-A3. 
Set  7  to  the  positive  root  of  a  2  7  2  +  fty  —  a  =  0. 

4.  Set  x  :=  x  +  7 z.  Renormalise  by  setting  x  :=  yj'j,  .  Go 
to  Step  2. 

Remark:  When  implementing  Alg.  1,  it  is  important  to  store  a,  3 
and  A  as  real-valued  variables. 

In  order  to  derive  the  formula  for  7  in  Step  3  of  Alg.  1,  it  is 
necessary  to  obtain  an  expression  for  the  decrease  in  cost  caused 
by  taking  a  step  of  size  7  in  direction  2.  Since  the  step  is  per¬ 
formed  in  complex  projective  space,  the  point  x  goes  to  the  point 

- -t,— - rather  than  to  the  point  a;  +  7 z.  Straightforward 

manipulation  shows  that  the  decrease  in  cost  is  given  by 


-\^~\  [(*  -7 Ax)H{x  -  7.4a:)] 

(x  —  ~fAx)H  (A  +  A  I)(x  —  ~)Ax) 

=  lhM  (3) 

1  +  Q72 

where 

A  =  A  —  XI,  a  =  xHA2x,  ft  =  xl,A?'x.  (4) 

Note  that  a  and  ft  are  real-valued  since  .4  =  AH .  Also,  since 
xH x  =  1,  xH  Ax  =  0. 

Differentiating  (3)  with  respect  to  7  and  setting  the  result  to 
zero  shows  that  the  greatest  decrease  in  cost  occurs  when  7  is  the 
unique  positive  root  of  the  quadratic  equation  given  in  Step  3  of 
Alg.  1. 

Before  proving  global  convergence,  two  properties  of  Alg.  1 
are  stated.  Alg.  1  is  invariant  to  shifts;  replacing  A  with  .4  —  XI 
for  any  A  e  R  has  no  effect.  This  supports  the  empirical  evidence 
(see  Section  3)  that  closely  spaced  eigenvalues,  which  are  known 
to  reduce  severely  the  rate  of  convergence  of  Power  methods  [3], 
do  not  affect  the  performance  of  Alg.  1.  Alg.  1  is  also  invariant 
to  orthogonal  changes  of  coordinates.  That  is,  if  Alg.  1  produces 
the  sequence  jx(0),  x(1),  •  •  •  j,  then  replacing  .4  with  QAQH  and 

x(0>  with  Qx°  will  produce  the  sequence  |Qx<0),  Qx{1\ ■  ■  •}. 


Theorem  2  (Convergence)  Let  x  be  the  initial  vector  chosen  in 
Step  1  of  Alg.  1.  If  Ai  is  the  smallest  eigenvalue  of  A  and  there 
exists  an  eigenvector  «i  satisfying  both  Av\  =  A  t  u  1  and  vft  x  f 
0,  then  Alg.  1  converges  to  an  eigenvector  v  satisfying  Av  =  Xiv. 

Proof.  Referring  to  Alg.  1,  since  zH  z  =  0  if  and  only  if  x  is 
an  eigenvector  of  A,  it  is  clear  that  Alg.  1  converges  to  an  eigen¬ 
vector  v  of  A.  Let  A  be  the  eigenvalue  associated  with  v.  Assume 
to  the  contrary  that  A  >  Ai.  Since  v  must  then  be  orthogonal 
to  vi,  this  implies  \v[l x\  — >  0.  It  will  be  shown  below  that  one 
iteration  of  Alg.  1  increases  \vi  x\  if  the  step  size  7  >  0  satisfies 

7  [a  —  (A  —  Ai)2]  <  2(A  —  Ai).  (5) 


Since  x  — >  v,  it  follows  that  A  =  x"  Ax  — >  A  and 
o  =  xH  A2 x  —  A"  — >  0. 


This  means  there  will  come  a  time  when  o  —  (A  —  A]  )2  <  0, 
and  hence  (5)  too,  will  hold  for  all  subsequent  iterations.  This 
contradicts  \v[‘ x\  ->  0.  proving  that  A  =  Ai . 

To  show  (5)  implies  \v[' x\  will  increase,  note  first  that  direct 
substitution  proves  that 


_ x  +  7 z _ _  1  -  7(Ai  -  A) 

y/(x  +  7  z)H(x  +  72)  ^/l  +  nj2 


(6) 


s/l+a-t2 


>  1  if 


Since  a  >  0,  it  is  readily  verified  that 

and  only  if  (5)  holds.  That  is.  (5)  implies  |wf  x\  will  increase, 
unless  \vix\  =  0.  However,  the  latter  cannot  occur  because  it  is 
straightforward  to  show  that  Ai  <  A  always  holds  (A  is  a  weighted 
average  of  the  eigenvalues  of  .4),  and  so  1  —  7(A)  —  A)  can  never 
be  zero.  D 


3.  SIMULATIONS 


This  section  studies  the  convergence  rate  of  Alg.  1  and  compares 
it  with  traditional  methods  for  calculating  extremal  eigenvectors. 
It  is  demonstrated  that  the  performance  of  Alg.  1  is  relatively  in¬ 
sensitive  to  the  actual  eigenvalue  distribution. 

The  Inverse  Iteration  method  [3]  for  finding  an  eigenvector  of 
the  matrix  .4  associated  with  the  eigenvalue  having  the  smallest  ab¬ 
solute  value  is  to  generate  a  sequence  jx(i7  j  of  vectors  according 
to  the  rule 


x(*+,) 


A~lx^ 

||.4-1x(*-)||’ 


(7) 


Figures  1  to  4  compare  the  Inverse  Iteration  method  (7)  with  the 
Steepest  Descent  method  (Alg.  1).  Figures  1  and  3  show  that  Alg.  1 
outperforms  (7)  if  the  eigenvalues  of  .4  are  closely  spaced,  while 
Figures  2  and  4  demonstrate  that  the  converse  holds  too.  This  is 
now  explained  in  more  detail. 

It  is  well-known  that  the  convergence  rate  of  the  Power  and 
Inverse  Iteration  methods  [3]  applied  to  the  matrix  A  critically  de¬ 
pends  on  the  eigenvalue  distribution  of  A.  Indeed,  replacing  A 
with  A  +  XI  for  some  constant  A  6  R  (known  as  a  shift  in  the 
literature)  significantly  alters  the  convergence  rate  of  (7).  In  com¬ 
parison,  Section  2  shows  that  such  shifts  do  not  alter  Alg.  1  at  all.  It 
is  therefore  expected  that  the  Inverse  Iteration  method  will  exhibit 
convergence  rates  ranging  from  extremely  poor  to  extremely  good 
depending  on  the  eigenvalue  distribution  of  A  whereas  Alg.  1  is 
expected  to  achieve  a  steady  rate  of  convergence  over  a  wide  range 
of  eigenvalue  distributions. 

This  hypothesis  was  tested  by  plotting  the  log  of  the  error, 

defined  as  log  ^x(A)^  Axi'k)  —  A,„in  {A}^  where  Amin  {A} 

is  the  smallest  eigenvalue  of  .4,  against  the  iteration  number  k. 
(The  fact  that  the  resulting  graphs  in  Figures  1  to  4  are  essen¬ 
tially  straight  lines  shows  that  both  algorithms  achieve  a  linear 
rate  of  convergence  [9].)  Figure  1  was  generated  by  applying  the 
algorithms  to  the  matrix  A  —  diagjl,  1.01, 1.02, 1.03, 1.04}. 
(In  all  simulations,  the  initial  starting  vector  was  chosen  to  be 
x(0)  =  [1111  1]T.)  Since  the  eigenvalues  are  closely  spaced. 
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Alg.  1  significantly  outperforms  (7).  Conversely,  Figure  2  shows 
that  (7)  outperforms  Alg.  1  when  applied  to  the  matrix 

A  =  diag  {1, 2, 3, 4, 5}  . 

Figures  3  and  4  suggest  that  this  behaviour  is  typical.  Figure  4 
shows  the  performance  of  the  two  algorithms  when  applied  to  ten 
randomly  generated  20-by-20  matrices  with  eigenvalues  uniformly 
distributed  between  0  and  1.  The  same  ten  matrices  were  then 
shifted  so  their  eigenvalues  lay  between  10  and  11  (that  is,  each 
A  was  replaced  with  A  +  101),  and  the  results  plotted  in  Fig¬ 
ure  3.  Whereas  the  performance  of  Alg.  1  is  unaltered,  (7)  per¬ 
forms  badly  in  Figure  3  but  exceptionally  well  in  Figure  4. 

The  Steepest  Ascent  method,  obtained  by  replacing  A  with 
—A  in  Alg.  1,  was  compared  with  the  Power  method  for  converg¬ 
ing  to  an  eigenvector  associated  with  the  largest  eigenvalue  of  A. 
The  Power  method  updates  xtk)  according  to  the  rule  (c.f.,  (7)) 
x(k+l>  =  jj^pcTjp  Figures  5  and  6  were  generated  analogously 
to  Figures  3  and  4.  They  demonstrate  that  Alg.  1  achieves  a  con¬ 
vergence  rate  which  is  much  less  sensitive  to  the  location  of  the 
eigenvalues  of  A  than  the  Power  method  does. 

4.  CONCLUSION 

This  paper  applied  the  novel  optimisation  algorithm  in  [7]  to  the 
problem  of  finding  an  eigenvector  associated  with  the  smallest  or 
the  largest  eigenvalue  of  a  Hermitian  matrix.  The  optimal  step  size 
was  calculated  and  a  global  convergence  proof  was  given.  Simula¬ 
tions  showed  that,  unlike  classical  algorithms  for  finding  extremal 
eigenvectors,  the  convergence  rate  of  the  proposed  method  is  rela¬ 
tively  insensitive  to  the  eigenvalue  distribution. 
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Figure  1:  Graph  comparing  the  convergence  rates  of  the  steepest 
descent  and  inverse  iteration  algorithms  when  applied  to  the  matrix 
A  =  diag  {1, 1.01, 1.02, 1.03, 1.04}. 


Figure  2:  Graph  comparing  the  convergence  rates  of  the  steepest 
descent  and  inverse  iteration  algorithms  when  applied  to  the  matrix 
A  =  diag  {1,2, 3, 4,  5}. 
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Figure  3:  Graph  comparing  the  convergence  rates  of  the  steep¬ 
est  descent  and  inverse  iteration  algorithms  when  applied  to  ten 
randomly  generated  20-by-20  matrices  with  eigenvalues  uniformly 
distributed  between  10  and  11. 


Figure  4:  Graph  comparing  the  convergence  rates  of  the  steep¬ 
est  descent  and  inverse  iteration  algorithms  when  applied  to  ten 
randomly  generated  20-by-20  matrices  with  eigenvalues  uniformly 
distributed  between  0  and  1 . 


Figure  5:  Graph  comparing  the  convergence  rates  of  the  steep¬ 
est  ascent  and  Power  method  algorithms  when  applied  to  ten  ran¬ 
domly  generated  20-by-20  matrices  with  eigenvalues  uniformly 
distributed  between  10  and  11. 


Figure  6:  Graph  comparing  the  convergence  rates  of  the  steep¬ 
est  ascent  and  Power  method  algorithms  when  applied  to  ten  ran¬ 
domly  generated  20-by-20  matrices  with  eigenvalues  uniformly 
distributed  between  0  and  1 . 
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ABSTRACT 

The  problem  of  maximum-likelihood  (ML)  completion 
of  a  partially  specified  Toeplitz  covariance  matrix  is 
crucial  in  several  applications,  such  as  the  detection 
and  estimation  of  more  independent  Gaussian  sources 
than  sensors  (m  >  M)  in  minimum-redundancy  sparse 
linear  antenna  arrays.  Given  the  sufficient  statistic  in 
the  form  of  the  M-variate  direct  data,  covariance  ma¬ 
trix  R,  we  describe  an  algorithm  that  finds  a  positive- 
definite  completed  M<*-variate  Toeplitz  matrix  (Ma  > 
M)  with  (locally)  maximal  likelihood  ratio  (LR).  Simu¬ 
lations  demonstrate  a  statistically  high  LR  is  achieved, 
compared  with  L2  optimisation. 


and  77(f)  6  CMxl  is  additive  white  Gaussian  noise.  The 
array-signal  manifold  matrix  is  B=  \b{0\ ).,  . . . ,  b(9m)}  G 
CM*m,  where  each  “steering  vector”  is 

b(6j)=  1,  exp  sin  dj'j  , . . . ,  exp  sin  fij'j 

(4 

and  A  is  the  wavelength  of  incident  radiation. 

For  a  uniformly-spaced  linear  array  (ULA),  the  array- 
signal  manifold  matrix  5  =  [«(<h),  •  ■  • ,  «(0m)]  G  CMaXm 
is  of  Vandermonde  structure,  with 

r  iT 

s(9j)  =  |L  exp (Mj),  . . .,  exp  (i[Ma-l]u}j)  (5) 


1.  PROBLEM  FORMULATION 


Consider  an  Af-element  nonuniform  linear  array  (NLA) 
with  sensors  located  at  positions  d,  restricted  to  integer 
values  measured  in  the  inter-element  spacing  units  of 
d,  usually  equal  to  half  a  wavelength: 

d  =  [d\  =  0,  dn,  d.3,  . . . :  dM]  (1) 

We  assume  Gaussian  processes  are  observed  at  the  out¬ 
put  of  the  sparse  array  as  a  combination  of  m  uncorre¬ 
lated  plane  waves  with  DOAs  6  —  [#i,  . . . ,  9m]T ,  pow¬ 
ers  P  =  diag  [pi,  ...  ,  pm]  and  white  noise  of  power  p0. 
Thus  the  M-variate  vector  of  observed  sensor  outputs 
at  time  f  (the  “snapshot” )  is 

y(t)  =  Bx(t)  +  77(f) ,  for  f  =  1,  ...,N  (2) 

where  y(f)  G  CMxl,  x(t)  €  Cmxl  is  the  vector  of  Gaus¬ 
sian  signal  amplitudes  with  the  property 


S{x[t i)xH(t2)}  =  |  ^ 


for  fi  =  fo 
for  fj  A  h  , 


(3) 


where  the  spatial  frequency  is  =  27r|sin0,  and  d  is 
the  inter-element,  unit  spacing. 

By  definition,  the  M-element  NLA  is  a  subarray  of 
the  M„-element.  ULA.  Their  relationship  may  be  de¬ 
scribed  by  the  M  x  Ma  binary  selection  (or  incidence) 
matrix  L,  where  Ljk  is  equal  to  unity  in  the  jth  row  and 
d^  column,  and  zero  otherwise.  Thus  the  NLA  man¬ 
ifold  can  be  written  B  =  LS,  and  the  (virtual)  ULA 
p.d.  Toeplitz  spatial  covariance  matrix  is  T  =  SPSH  + 
Po  Im0  and  is  related  to  the  p.d.  Hermitian  covariance 
matrix  R  of  the  actual  NLA  R.  =  BPBh  +p  Im  by  the 
crucial  linear  transformation  R  =  LTLt  . 

We  restrict  this  study  to  the  class  of  identifiable  sce¬ 
narios,  ie.  situations  with  a  one-to-one  correspondence 
between  the  set  of  signal  parameters  (m,  9,  P,  po)  and 
the  covariance  matrix  R.  Consequently,  there  is  a  one- 
to-one  correspondence  between  the  M-variate  (true) 
covariance  matrix  R  and  the  Ma-variate  covariance 
matrix  T  for  any  m  of  interest  (1  <  m  <  Ma  -  1). 
In  fact,  identifiability  is  not  guaranteed  for  sparse  ar¬ 
rays  (identifiability  issues  are  addressed  in  [1]). 

Given  N  independent  snapshots,  the  sufficient  statis¬ 
tic  for  DOA  estimation  is  the  direct  data  covariance 
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(DDC)  matrix  R  =  1/(0  1/^(0  tllat-  simulta¬ 

neously,  is  the  ML  estimate  of  the  unstructured  (if.  ar¬ 
bitrary  p.cl.  Hermitian)  covariance  matrix. 

Partially  augmentable  NLAs  have  one  or  more  miss¬ 
ing  covariance  lags  [2],  eg.  the  geometry 

dr,  =  [0,  1,  4,  9,  11]  (6) 

embodies  all  covariance  lags  except  the  single  lag  !(,. 
Therefore,  if  the  sample  covariance  lags  Rjk  of  R  are 
used  to  construct  an  estimate  of  the  augmented  Ma- 
variate  Toeplitz  matrix,  one  or  more  diagonals  of  such 
an  augmented  matrix  will  be  missing. 

Traditionally,  the  straightforward  direct  augmenta¬ 
tion  approach  (DAA)  [3]  is  used  to  form  the  augmented 
Toeplitz  matrix: 


lj  —  k  —  k 


E  j,k.RjkS(K.dj-dk) 

E j.kS(K'  dj-dh) 


j  >  k.  k  e  s  (7) 


where  S(a,b)  is  the  generalised  Kronecker  delta  func¬ 
tion,  and  we  have  defined  the  complementary  sets  ,S  = 
{k  :  1K  is  specified}  and  S  =  {a-  :  t.e  is  unspecified } . 
For  minimum  redundancy  arrays,  such  as  dr,,  none  of 
the  elements  in  R  are  redundant  (obviously  except  for 
Rjj),  and  in  this  case 


tj-k=K  =  Y  Rj'k  dJ  ~  <h-)  •  (8) 


Interestingly,  (7)  and  especially  (8)  could  be  viewed 
as  the  optimum  unconstrained  solution  that  yields  the 
minimum  in  the  L?  norm:  \\R  —  LTLH\\n.  This  is  im¬ 
portant,  since  it  is  the  To  norm  (with  an  additional 
weighting  matrix)  that  constitutes  COMET  [4]. 

In  our  approach,  we  are  looking  for  the  p.d.  Toeplitz 
matrix  estimate  T,  and  for  its  Jl/- variate  linear  trans¬ 
formation  R  =  LTLh  that  yields  the  (local)  maximum 
to  the  (eg.  sphericity  test)  likelihood  ratio  -,(/?): 

7 (R)  =  ifW.  7o(R)  =  7 — _  ,  (SI 

[i*UR'  R]j 

in  the  vicinity  of  T,  defined  by  (7), (8).  Obviously,  the 
problem  of  T  estimation  given  the  DDC  matrix  R .  with 
some  continuous  (element-wise)  transformation  R  — > 
T,  T  ->■  R ,  is  important  for  many  other  applications 
(eg.  see  [5]). 


2.  PROBLEM  SOLUTION 


Observe  that  the  LR  (9)  could  be  presented  as 


lo(R)  = 


(10) 


where  A;,-  (k  =  1 . M)  are  eigenvalues  of  the  matrix 

G(R)  =  R-'-R  R~$  .  (fl) 

We  present  optimisation  steps  as  a  sequence  of  suffi¬ 
ciently  small  perturbations  To  — >  T\  7'  where 

Ja-+i  =  T);  +S(T ):  specifically 

Tk+1(z)  =  T,  +  Y^E+  +  z'hEt)  (12) 

k 

where 


'01 

'  0 

0 

II 

1 

fc) 

1  0 

1 

1 - 

o 

1  0  _ 

(13) 


or  in  terms  of  real  variables 
Tk+\(z)  —  Tk  +  ^[Tvc  (zkFk)  +  Urn  (zkFk  )]  (14) 

k 


where 

F+  =  Ek+  +  Ek_ ;  F~  =  E%  -  Et  ■  (15) 

It  is  well  known  that  the  set  of  Toeplitz  Hermitian  ma¬ 
trices  is  congruent  to  the  set  of  real  symmetric  matrices 
via  the  unitary  transformation: 

Q  =  Re  (T)  +  Jp  1m  ( T )  =  HTHh\  T  =  7/  "Qll , 


H  =  r2l(I  +  Jr)  +  »'(/ -7,)];  HhH  =  I  (16) 


where 


Jp 


1 


(17) 


is  the  permutation  matrix.  Evidently 
dot Q  —  c\et(HT IIH )  =  det T;  eig(Q)  =  eig(T).  (18) 
We  may  then  transform  to  real-valued  perturbations: 


A/,, 


Qk+\  =  HTk+l(z)HH  =  HTkHH+J2x>kF++x”F-, 


k  =  \ 


2M„-\ 

Qk+l=:QK+  Y,  ****  (,!)) 

k  =  l 

so  for  sufficiently  small  xk,  we  can  treat  ^._1°  1  xkFk  = 
SQ  as  a  perturbation  of  the  matrix  Qk. 

Supposing  sufficiently  small  perturbations  in  Tk  (and 
thus  in  Qk),  we  have  small  perturbations  in  the  iterate 
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(29) 


Rk  '■  Ro  — >■  Ri  R  and  therefore  in  the  iterate 

Gk :  Go  — >  G\  &(R).  Then  we  can  write  the 

first-order  expansion  of  the  eigenvalues  of  Gk+i  as: 


\t(k  +  1)  —  X^(k)  +  S((k  +  1) 


(20) 


with  |<l~t(k  +  1)|  <  Af(fc),  so  that,  the  inverse  Xc  1(k  +  1) 
is  also  given  by  a  first-order  expansion: 

Xj\k  +  1)  =  X^(k)[l  -  S((k  +  l)XjHk)}  (21) 

then  the  LR  ^(Rk+i)  could  be  also  presented  suffi¬ 
ciently  accurately  by  its  first-order  expansion  as: 


7o(Rk+i)  =  'yo(Rk)- 


1  ~  El= 1  +  1)  XC  1(^’) 


,  _  EL  S((k+l)X7Rk) 
EL  v‘(*> 


tM 


(22) 


7o  ( Rk )  exp 


M 


■J2Mk+l)(XjHk) 


i=  l 


(23. 

where  A*  —  ]T^=1  \  Obviously  local  LR  max¬ 

imisation  in  its  first-order  approximation  is  equivalent 
to  minimisation  of  the  linear  function 


M 


t= 1 


A71(A)  —  \J2(k): 


M 


ELi  xe  1(k) 


(24) 


subject  to  some  constraints  that  ensure  the  perturbed 
matrix  Qk+i  (and  thus  Tk+i)  are  p.d.,  and  that  the 
perturbations  (19)  are  sufficiently  small  for  the  validity 
of  the  first-order  expansions.  Given  the  perturbations 
of  Qk+i  (19),  the  perturbations  in  G*,+i  are 

2A/q  —  1 

G/c+i  =  Gk  +  £  xtR~h-LHHFtHLH R-k-  (25) 

£=1 

and  according  to  eigenvalue  perturbation  theory  [6], 

Xj(Gk+ i)  =  X  j(Gk)+ 

2M  —  1 


Y2  xlgf)H{R-^-LHHFlHLHR- 


?)<7i 


(*) 

3 


(26) 


£=1 


where  is  the  jth  (j  =  1, . . . ,  M)  eigenvector  of  the 
matrix  Gk  ■  Similarly,  the  eigenvalues  of  the  perturbed 
matrix  Qk+i  (and  Tk+ 1  via  (18))  can  be  written  as: 

2M-1 

<rj(Qk+ i)  =  <rAQk)  +  Y2  xtU-k)H Fdf  ]  ■  (27) 

t- 1 

We  now  introduce  the  M  x  2(Ma— 1)  matrix  iterate  Vk 


Rk 


9 


(k)H , 


R-iLHHF(HLnR~ig 


//p-i  G) 


(28) 


and  the  Ma  x  2(Ma  —  1)  matrix  iterate  Vk 

fclr.,2Ma-l 

According  to  (26),  we  may  present  the  vector  Aj+1  = 
[(5i(A-  +  1), . . . ,  Jm(A  +  1)]  as  Ak+\  =  Vk-x  and  pertur¬ 
bations  to  eigenvalues  of  the  matrix  Qk+i  as  Vkx. 

Finally,  with  all  introduced  notations,  we  can  for¬ 
mulate  the  problem  of  the  optimum  perturbation  as 
the  linear  programming  problem: 


Find 

min (Lj.T>kx)  subject  to 

(30) 

~tTk  ~  VkX  <  -<701 

(31) 

—£  <  X 

c  <  —  s ,  (  —  1 , . . . ,  2Ma  —  1 

(32) 

where 

Lk  =  U"1 

,  M  \ 

lk)  X7-(k)  „ 

EteiVW 

(33) 

for  (  =  1 , . . . ,  M ;  rrk  is  the  A/a-variate  vector  of  eigen¬ 
values  of  Qk,  and  <r0  is  the  minimum  eigenvalue  of  the 
initial  matrix  To. 

The  Ma  linear  constraints  (31)  keep  the  optimised 
Toeplitz  matrix  T*+ 1  (and  hence  Rk+i)  p.d.,  while  the 
linear  constraints  (32)  ensure  the  perturbations  are  suf¬ 
ficiently  small.  Of  course,  step  size  management  is  nec¬ 
essary  to  ensure  that  the  solution  obtained  by  both 
expansions  (26)  and  (27)  are  valid,  and  that  the  LR 
is  actually  increased,  whilst  maintaining  Tk+ 1  >  0.  If 
either  condition  is  not  satisfied,  the  “step  size”  e  is  re¬ 
duced,  and  the  LP  problem  is  solved  again.  If  both 
conditions  are  met,  we  compute  Qk+i  and  use  it  for 
the  next  iteration  TK. 

Finally,  we  need  to  specify  the  initialisation  of  our 
routine  that  produces  an  initial  p.d.  matrix  To,  given 
the  DAA  complete  matrix  T.  This  step  is  explicitly 
described  in  [2],  since  we  use  maximum  entropy  (ME) 
completion  as  the  starting  point  (T0  =  ) .  In  [2] ,  we 

demonstrated  that  convex  programming  routines  could 
be  used  to  check  the  feasibility  conditions  (ie.  that  T 
could  be  p.d.  completed),  and  to  introduce  I2-minimal 
perturbations  to  the  specified  entries  to  actually  achieve 
a  p.d.  completion.  Next,  the  “missing”  elements  in 
T  are  optimised  to  achieve  the  ME  condition.  We 
emphasise  that,  the  missing  lags  are  not  presented  in 
R  =  LTLh  ,  and  thus  they  do  not  affect  the  optimised 
LR.  Nevertheless,  due  to  the  p.d.  condition,  these  lags 
need  to  be  optimised  in  order  to  give  more  freedom 
to  the  specified-lag  perturbations  in  pursuing  the  max¬ 
imum  LR.  In  our  full  detection-estimation  algorithm 
(forthcoming  paper),  the  solution  T  is  the  one  that 
corresponds  to  the  maximum  number  of  independent 


Vk  = 


Ujk)HFtujk)] 


231 


sources  ( A/,,  -  1),  thus  yielding  the  maximum  LR.  Fur¬ 
ther  equalisation  of  the  last  ( M,x  —  //)  eigenvalues,  also 
conducted  with  an  iterated  LP  optimisation,  generates 
"candidate  models"  X),  with  correspondingly  degraded 
LR.  Obviously,  optimisation  of  the  missing  lags  is  cru¬ 
cial  now  for  equalisation,  and  so  the  two-stage  proce¬ 
dure  is  proposed.  In  the  first  stage,  we  optimise  only 
the  missing  lags  (that  keep  the  LR  unchanged),  and 
only  after  convergence  do  we  modify  all  entries  for  ac¬ 
curate  equalisation,  causing  a  degradation  in  LR. 

3.  SIMULATION  RESULTS  AND 
CONCLUSIONS 

To  demonstrate  the  efficiency  of  our  method,  we  present 
the  results  of  1000  Monte-Carlo  trials,  conducted  for 
the  antenna,  array  do  and  the  identifiable  six-source  sce¬ 
nario  w()  =  [-6.9,  -0.68,  -0.46,  -0.24,  -0.02.  0.20], 
Obviously,  since  the  sixth  lag  is  missing  in  T,  no  ex¬ 
isting  technique  is  directly  applicable  for  detection  and 
DOA  estimation.  Fig.  1  presents  distributions  of  the 
LR.  calculated  for: 

(a)  the  true  (exact)  covariance  matrix  for  dr,  +  wt). 

(b)  the  ME  completion  To  =  XAIE  [2], 

(c)  the  locally  optimal  ML  completion  TAn  . 

Note  that  TAfE  experienced  minimal  perturbations  to 
the  specified  R  entries  in  order  to  achieve  feasibility. 
Despite  the  perturbations  in  TA)7_  being  minimal  in  the 
L->  sense,  the  LR.  distribution  is  significantly  worse  than 
that  for  the  optimised  solution  T.  More  importantly, 
comparison  of  the  LR  distributions  for  T  and  T  demon¬ 
strate  that  in  many  cases  the  LR-optimised  solution 
Tail  exhibits  an  LR  greater  than  the  true  covariance 
matrix  generated  for  the  same  sufficient  statistics  Rl 
Similarly  to  the  fully  augmentable  case  [7],  LR(TAn  ) 
is  clearly  right-skewed  compared  with  LR(R).  Though 
we  still  cannot  prove  that  in  every  trial  the  global  LR 
maximum  has  been  achieved,  this  is  a  sufficient  ar¬ 
gument.  to  treat  further  attempts  to  improve  the  LR 
(via  local  LR  optimisation)  as  statistically  unproduc¬ 
tive  in  terms  of  detection-estimation  efficiency.  The 
introduced  results  also  demonstrate,  that  even  for  suf¬ 
ficiently  large  sample  volumes,  the  LR  remains  very 
sensitive  and  in  many  cases,  optimisation  according  to 
some  “related"  criteria  (such  as  minimum  Euclidean 
norm  in  COMET  [4]),  could  lead  to  results  that  are 
surprisingly  far  from  the  optimum. 
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ABSTRACT 

EEG-based  brain  maps  are  very  useful  in  anatomical, 
functional  and  pathological  diagnosis.  These  images  are 
projection  of  energy  of  the  signals  in  four  different 
frequency  bands.  Joint  Approximate  Diagonalization  of 
Eigenmatrices  (JADE)  is  used  as  an  effective  tool  in 
deconvolution  of  EEG  signals  prior  to  spectrum 
estimation.  The  algorithm  also,  restores  the  noise  from 
the  signal  as  a  result  of  Higher  Order  Statistics  (HOS) 
estimation.  The  spectrum  is  estimated  using 
autoregressive  (AR)  modelling  and  pseudo-hot  colours 
are  used  to  represent  brain  activities.  The  results  testify  a 
great  enhancement  in  diagnostic  features  in  the 
reconstructed  images.  The  overall  system  also  enables 
real-time  reconstruction  of  the  images  for  patient 
monitoring  purposes. 

1.  INTRODUCTION 

EEGs  project  electrical  activities  of  the  brain  [1][2][3]. 
The  EEG  is  divided  into  4  sub-bands;  Delta  activity  is 
around  4  Hz  or  below.  It  tends  to  be  the  highest  in 
amplitude.  It  is  the  dominant  rhythm  in  infants  and  in 
stages  3  and  4  of  sleep.  It  may  occur  focally  with 
subcortical  lesions  and  in  general  distribution  with  diffuse 
lesions,  metabolic  encephalopathy  hydrocephalus  or  deep 
midline  lesions.  It  is  usually  most  prominent  frontally  in 
adults.  Theta  activity  has  a  frequency  of  4  to  8  Hz.  It  is 
abnormal  in  awaked  adults  but  is  perfectly  normal  in 
children  up  to  13  years  and  in  sleep.  It  can  be  seen  as  a 
focal  disturbance  in  focal  subcortical  lesions.  Alpha 
waves  are  those  between  8  and  14  Hertz.  Alpha  is  usually 
best  seen  in  the  posterior  regions  of  the  head  on  each 
side,  being  higher  in  amplitude  on  the  dominant  side.  It  is 
brought  out  by  closing  the  eyes  and  by  relaxation,  and 
abolished  by  eye  opening  or  alerting  by  any  mechanism. 
It  is  the  major  rhythm  seen  in  normal  relaxed  adults.  Beta 
activity  is  ’fast’  activity.  It  has  a  frequency  of  14  Hz  and 
above  (normally  up  to  about  40  KHz).  It  is  usually  seen 
on  both  sides  in  symmetrical  distribution  and  is  most 
evident  frontally.  It  may  be  absent  or  reduced  in  areas  of 
cortical  damage.  It  is  generally  regarded  as  a  normal 


rhythm.  It  is  the  dominant  rhythm  in  patients  who  are 
alert  or  anxious  or  who  have  their  eyes  open. 

In  this  paper,  we  address  the  issue  of  finding  an 
appropriate  and  accurate  method  to  extract  the  spectrum 
of  each  actual  EEG  signal.  This  is  done  by  blind 
deconvolution  of  the  signals  followed  by  AR-based 
spectrum  estimation.  Reconstruction  of  the  brain  map 
will  then  be  more  accurate  and  informative.  Triangular 
cubic  interpolation  criterion  is  exploited  in  mapping  the 
energy  of  each  frequency  band  into  the  image  of  cross 
section  of  the  brain.  Four  images  represent  the  activity  of 
the  brain  in  above  four  frequency  subbands.  Each  EEG 
signal  is  actually  a  combination  of  an  unknown  number 
of  sources  inside  the  brain  plus  various  internal  signals 
such  as  heart  rate,  cardiovascular,  muscular,  and  external 
signals  such  as  system  noise.  A  suitable  procedure  for 
deconvolution  and  restoration  of  the  signals  prior  to 
neuro-image  reconstruction,  is  highly  demanded. 

2.  BLIND  SIGNAL  SEPARATION 

Blind  signal  separation  is  the  process  of  extracting  the 
unknown  source  signals  from  their  combinations.  The 
channel  and  permutation  of  the  output  signals  are  also 
unknown  to  us.  The  simple  model  is  as  follows: 


Figure  1.  BSS  block  diagram 


Noise  normally  appears  as  a  disturbance  to  the  signal. 
Because  of  the  correlation  between  the  two 
measurements,  it  is  in  principle  possible  to  separate  out 
the  noise.  Here,  it  is  assumed  that  the  EEG  signals  are 
linear,  instantaneous  combinations  which  are  combined 
by  using  linear  transformations.  Under  these  assumptions, 
the  problem  can  be  rewritten  in  matrix  formulation  as 

Xk  =  ASk  +  Ek  (1) 
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and 


Yk  =  BXk  (2) 

where  Ek  is  white  Gaussian  noise  vector  and  A  and  B  are 
unknown,  constant  matrices  of  sizes  m  x  n  and  n  x  m, 
respectively. 

It  is  observed  that  producing  output  signals  that  are 
decorrelated  is  relatively  easy  whereas  achievement  of 
independent  outputs  require  more  work  to  be  done.  The 
mathematics  of  the  separation  task  helps  to  explain  this 
observation.  We  aim  to  produce  independent  outputs,  i.e., 

p{yt(k),  yj (k))  =  piyM)  piyfk))  (3) 

for  all  pairs  of  outputs.  A  necessary  but  insufficient 
condition  for  having  independent  outputs  is  to  have 
uncorrelated  outputs,  i.e.,  E[y,(k)yj(K)]  =  £[y,(fc)]E[y,(A')]. 
for  all  pairs  of  outputs.  A  stronger  condition  will  be 
E[f(yi(k))g(yj(k))]  =  E[f(yi(k))]E[g(yj(k))l  where  /(•)  and 
g(-)  are  nonlinear  functions. 

To  form  the  decorrelated  outputs  y (k)  =  C'x(k)  where,  C 
is  a  constant,  linear  transformation,  let  the  covariance 
matrix  of  the  measurements  be  R  =  E[x(k)x(k)r). 
Decorrelation  of  the  output  requires 

E[y(k)y(k)r]  =  E[Cx(k)x(k)TCT]  =  CRCT  =  D  (4) 

where  D  is  a  diagonal  matrix.  Let  V  be  a  matrix  formed 
by  assembling  the  eigenvectors  of  R  into  a  matrix  and  W 
be  a  diagonal  matrix  whose  main  diagonal  contains  the 
eigenvalues  of  R. 

Let  C  =  W,/2VT.  As  the  eigenvectors  of  R  form 
orthogonal  basis,  it  follows  that: 

CRCr  =  WI/2VTVWVrVW,r-  =/  (5) 

However,  the  solution  C  =  UW,/2VT,  where  U  is  an 
arbitrary  unitary  matrix,  also  decorrelates  the  outputs. 

JADE  algorithm  was  proposed  by  Cardoso  [4],  This 
procedure  uses  matrices  QZ(M)  formed  by  the  inner 
product  of  the  fourth-order  cumulant  tensor  of  the  outputs 
with  an  arbitrary  matrix  M,  i.e., 

n  n 

Cum^Zj ,z),zk, z* )mlk  (6) 

k=l  1=1 

where  the  (/,A)th  component  of  the  matrix  M  is  written  as 
m,k  and  Zk  =CYk.  The  matrix  QiM)  has  the  important 
property  that  it  is  diagonalized  by  the  correct  rotation 


matrix  U,  i.e.,  UHQz(M)U  =  hM  where  H  denotes  the 
complex  (Hermitiam)  transpose  operator,  and  A,w  is  a 
diagonal  matrix  whose  diagonal  elements  depend  on  the 
particular  matrix  M  as  well  as  Zk.  By  using  equation  6, 
For  a  set  of  different  matrices  M ,  a  set  of  cumulant 
matrices  Q:(M)  can  be  calculated.  The  desired  rotation 
matrix  U  then,  jointly  diagonalizes  these  matrices.  The 

required  procedure  for  estimation  of  A  can  be 
summarized  into  the  following  steps: 


1 .  Form  the  sample  covariance  matrix 

K, =£{x(/)Xr(()j 

1.  Compute  a  SVD  of  Rx 


0  ■ 


o 

o 


t*.  Uj 


Where  X-~  =  O’2 ,  /  =  N  + 1,- •  -m  and  O  2  is  the 
noise  variance. 


2.  Estimate  the  number  of  the  sources  N  by  the  number 
of  singular  values  that  do  not  equal  to  O'.  The 

matrix  U s  is  composed  of  N  singular  vectors  whose 

"> 

singular  values  do  not  equal  to  O'. 

3.  Whitened  the  data  x{t )by 


y(t)=Wx(t) 


where  W  =  D  v"(/ s  and  the  diagonal  matrix  is 
given  as 

D  ding  {d  dy  ~ ,  ,dN~^  (10) 

where  d~  =  X  ~  —<J~  i  =  1,2,’  •  -,m  . 

4.  Form  the  sample  fourth-order  cumulant  matrix  Q  of 
the  whitened  data  yit'). 

5.  Compute  the  N  most  significant  eigenpairs 
\Xr  ,  M  r  \  <  r  <  N  This  is  done  by  first 
computing  the  eigen-decomposition  of  Q 
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0  A,2  • 


e=[£/,t/„] 


0 
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[Us  Uj 


(11) 


Take  each  column  of  Us  and  reshape  the  eigenvector 
into  a  NxN  matrix  M,  Which  is  called  an 
eigenmatrix. 

7.  Jointly  diagonalize  the  set  Ne  defined  as 
Ne={Ar,Mr  :l<r<N]  by  a  unitary  matrix 
V. 

The  diagonalization  is  done  by  a  modified  Jacobi 
Rotation  technique. 

8.  The  estimated  mixing  matrix  is  then 


A  =  W*V 


(12) 


9.  By  inversing  A ,  the  estimated  source  signal  will  be 


A  A  “I 

Y  =A  X 


(13) 


3.  SPECTRUM  ESTIMATION 

The  Yule-Walker  autoregressive  (AR)  method  [5]  is  a 
parametric  method  that  estimates  the  autocorrelation 
function  to  solve  for  the  AR  model  parameters.  The 
method  is  superior  to  DFT  since  it  avoids  noise  and 
blocking  effects.  The  AR  parameters,  ak s,  are  estimated 
by  minimisation  of  the  residual  signal,  e(n). 

p 

e(n)  =  x(n)-^akx(n-k)  (14) 

Ar=l 

where  p  is  the  prediction  order  whic  can  be  obtained  in 
order  to  achieve  the  minima  for  e(n).  Durbin  algorithm 
[3]  efficiently  estimates  a  set  of  coefficients  based  on 
calculation  of  autocorrelation  matrix  and  zeroing  the  error 
partial  differentiation  respect  to  ak s.  The  minimum  value 
for  prediction  order  p  can  be  identified  using  an  iterative 
procedure.  The  procedure  sets  p  for  having  an  error 
average  below  a  low  threshold  level  [6]. 

4.  IMPLEMENTATION  AND  RESULTS 


where  the  activity  is  low  when  compared  to  the  delta  and 
alpha  subbands.  BSS  is  initially  applied  to  decorrelate  the 
EEG  signals.  Application  of  BSS  in  processing  of  EEGs 
greatly  changes  the  results.  The  spectrum  of  each  signal 
is  estimated  using  AR  method.  The  spectrums  are  divided 
into  four  subbands  referred  earlier.  The  energy  of  each 
subband  is  measured  by  using  the  following  equation. 

Eb  =  ^JX(m)|~,  b  =  Delta, Theta,  Alpha,  Beta 

band  b 

(15) 

These  energy  values  are  allocated  to  the  actual 
geometrical  positions  of  the  electrodes  in  the  model 
image  as  the  amplitudes  of  those  points.  Then  the 
amplitudes  are  extrapolated  to  a  surface  and  hot-colors 
are  used  to  highlight  the  levels  of  activity  in  these 
surfaces.  Figure  2  and  3  illustrate  the  effect  of  BSS 
followed  by  AR  on  these  images. 


The  regions  of  activity  are  changed  after  application  of 
BSS.  This  is  more  or  less  expected  as  each  EEG  can  be 
assumed  to  be  a  combination  of  the  electrode  and  its 
adjacent  pin  signals  including  noise.  The  BSS  algorithm 
will  serve  to  deconvolve  the  desired  electrode  signal  from 


Figure  2:  Reconstructed  brain  map  of  normal  person  using  BSS 
(left)  and  without  using  BSS  (right). 


AR  is  preferred  to  FFT  due  to  suppression  of  noise  and 
blocking  effect  especially  in  the  theta  and  beta  sub-bands 


Figure  3:  Reconstructed  brain  map  of  CJD  patient  using  AR 
without  BSS  (left)  with  BSS  (right). 
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the  rest.  In  this  case  each  signal  carries  the  local 
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ABSTRACT 

A  novel  approach  is  proposed  in  order  to  efficiently  separate 
mixed  evoked  potentials  (EPs)  presented  simultaneously  by 
different  stimuli.  This  approach  is  developed  as  follows.  We 
first  apply  a  robust  independent  component  analysis  (ICA) 
approach  to  the  observed  sensor  signals  for  the  separation  of 
the  superimposed  EP  signals.  Next,  the  desired  EP  components 
are  estimated  by  matched-filtering  the  separated  signals. 
Impulse  response  of  such  matched  filter  can  be  computed  based 
on  third-order  cumulants  of  the  filter  input  signal.  Therefore, 
due  to  the  tolerance  of  the  third-order  cumulants  to  both 
Gaussian  and  any  symmetrical  distributed  non-Gaussian  noise 
or  interference,  the  filter  impulse  response  will  be  matched  with 
the  desired  signal  alone.  It  is  demonstrated  by  extensive 
computer  simulations  that  applying  the  cumulant-based  ICA 
and  filtering  improves  dramatically  the  SNR  of  the  final 
estimation  of  the  EP  components. 

1.  INTRODUCTION 

The  sensory  brain  evoked  potentials  (EPs)  are  electrical 
responses  of  the  central  nervous  system  to  sensory  stimuli 
applied  in  a  controlled  manner.  The  interest  in  these  potentials 
arises  from  their  utilization  as  clinical  and  research  tools  and 
for  their  contribution  to  the  basic  understanding  of  the 
functions  of  the  brain  [l]-[3],  [5],  [7],  Ensemble  averaging  and 
weighted  ensemble  averaging  have  been  usually  used  to 
enhance  the  SNR  [1],  Such  techniques  can  be  thought  of  as 
lowpass  filtering  of  noise  and  a  very  large  number  of  sweeps  is 
required  to  obtain  a  suitable  EP  estimate.  Wiener  Filtering 
based  techniques  have  also  been  extensively  used  for  the 
enhancement  and  recovering  of  the  EP  [3],  [7].  In  one  adaptive 
implementation  of  the  Wiener  filter,  the  noisy  EPs  are  taken  as 
the  primary  input  while,  the  auxiliary  reference  input  has  been 
taken  as  constructed  models  of  the  EPs  because  the  reference 
noise  is  in  general  not  available.  Various  kinds  of  basis 
functions  have  been  used  to  construct  such  models  [4],  The 
performance  of  this  approach  is  then  dependent  on  how  much 
the  assumed  model  is  close  to  the  EP  signal.  In  another 
approach,  where  multiple  sweeps  are  available,  the  primary 
input  is  taken  as  the  ensemble  average  while  the  reference  input 
is  taken  as  one  sweep  that  is  not  included  in  the  average,  which 
keeps  noise  uncorrelation.  Unfortunately,  the  Wiener  filtering 
method  deteriorates  if  both  the  signal  and  noise  spectra  are 
overlapped. 


In  the  present  work,  separation  of  single  trail  EP  components 
presented  simultaneously  by  two  or  more  different  stimuli  is 
considered.  Because  most  blind  signal  separation  techniques 
cannot  handle  additive  noise,  we  propose  to  enhance  the  signal- 
to-noise  ratio  (SNR)  of  the  independent  components  estimated 
by  a  robust  (i.e.,  unbiased)  ICA  approach.  This  SNR 
enhancement  is  achieved  by  matched  filtering  the  ICA  output 
signals.  Impulse  response  of  such  matched  filter  can  be 
computed  based  on  third-order  cumulants  of  the  filter  input 
signal.  Therefore,  due  to  the  tolerance  of  the  third-order 
cumulants  to  both  Gaussian  and  any  symmetrical  distributed 
non-Gaussian  noise,  the  filter  impulse  response  will  be  matched 
with  the  desired  signal  alone.  In  Section  II  we  formulate  the 
problem,  give  a  brief  review  for  the  blind  signal  separation  and 
the  cumulant-based  filtering  approach.  In  Section  III,  the 
proposed  approach  is  described.  Section  IV  presents  extensive 
simulation  results  and  finally  Section  V  gives  the  conclusions. 

2.  CONVENTIONAL  METHODS 
2.1.  Signal  Model  and  Problem  Formulation 

Multiple  m  observations  of  EP  signals  X=[x,  x2  ...xm  ]T  can 

be  modeled  as  a  mixed  of  independent  signals  plus  noise  given 
by 

x=As+v  (i) 

where  A  is  an  mxn  full-column  rank  matrix  with  m>n, 
^=[5i  ■s2-"'s«]r  is  an  «xl  vector  gathering  the  independent 

EP  sources,  and  V  is  an  mx  1  vector  for  additive  noise 
representing  ongoing  EEG  of  brain  activity.  We  assume  that  the 
EP  signals  have  no  zero  third-order  correlations.  The  objective 
is  to  estimate  the  independent  sources  i';  given  the  observed 

noise  mixed  signals  X,  i.e.,  to  find  a  separating  matrix  W  so 
that 

y~s=Wx=WAs=PDs  (2) 

where  P  is  any  permutation  matrix  and  D  is  a  diagonal  scaling 
matrix. 

It  is  often  that  blind  signal  separation  methods  assume  noise 

free  mixed  observations,  i.e.,  V=0  or  negligible  small  The 
challenging  is  then  to  achieve  robust  separation  and  to  reduce 
noise  either.  In  estimating  the  separating  W,  some  separation 
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methods  are  robust  to  noise,  however,  because  the  noise-free 
sources  cannot  directly  be  obtained  based  on  W  and  the 
observed  signals  X. 

In  this  paper  our  proposal  is  first  to  apply  robust  ICA  and  next 
to  filter  the  independent  components.  In  the  last  stage  we  can 
reconstruct  clean  or  corrected  sensor  signals  by  using  back 

A  A 

projection  X,  -  (one  by  one). 

2.2.  Robust  ICA  in  Gaussian  noise 

There  are  several  efficient  batch  algorithms,  which  are 
theoretically  insensitive  to  additive  noise  (if  the  number  of 
available  samples  is  sufficient  large)  including  JADE  (Joint 
Approximation  Digitalization)  [8],  ROSBI  (Robust  Second 
Order  Blind  Identification)  [9]  and  ERICA  (Equivarent  Robust 
ICA)  [10].  [11].  In  this  paper,  we  use  family  of  ERICA 
algorithms  based  on  third  and/or  fourth-order  matrix  cumulants. 
The  ERICA  algorithm  developed  by  Cruces  et  al.  [10].  [11] 
estimates  (unbiased  in  respect  to  Gaussian  noise)  the  separating 
matrix  W (M)  as  follows 


replica  of  a  selected  one-dimensional  third-order  cumulant  slice 
of  y,(k) .  That  is.  h,{j)  is  given  by  [12] 


\cyU-j\  7  =  0.1 . J 

jcV(  U~J)'  j=J+lJ+2,...,2J 


(5) 


where 

cY  (./)=vSlo  y,(k)yl(k+J)y,(k+T)  <6> 

K 

where  K  is  the  number  of  available  time  samples  and  T>0  is  a 
positive  time  shift.  The  filter  output,  the  enhanced  version  of 
the  fth  filter  input  signal  yf(k)  is  the  convolution  sum  of  the 
input  and  the  impulse  response  given  by 

>',(«)=y,X  h,{j)Vi(k-j)  (7) 

j=o 


where  yi  is  a  constant  chosen  so  as  to  provide  unity  Skewness 


gain  for  the  filter. 


3.  THE  PROPOSED  APPROACH 


W(M)=W(()+Ti(n[i-Cll}(y,y)SigyW(c)  o) 

where  |  is  the  number  of  iteration  or  alternatively  for 
prewhitened  (sphere)  data  (using  for  example  SVD  or  factor 
analysis)  as 

W(M)=W(0+m[SigyCPA(y,y) 

-Clp(y,y)Sigy]W(0 

where  ca(y,)  denotes  the  a -order  cumulant  of  the 
separated  signals  V, ,  Ca  [i  (>’,>’)  denotes  a  matrix  cumulant 
whose  (ij)th  element  is  given  by  the  cross-cumulant  fimction 
Ca.p  (yi  ’)!j  )  =  Cum(yi  ,yj  J  )  and  Sig  v  will  be  a 

a  p 

short-hand  notation  to  refer  to  the  diagonal  matrix  containing 
the  signs  of  the  diagonal  cumulants 

Sigy  =diag(sign(diag(Chp  ( V.JC)))  .  For  example,  for  the 
fourth-order  matrix  cumulants  (/3  =  3) 

[Sigv }it=[sign(Cnmy(y, ))]„  (the  actual  kurtosis  signs  of 
the  output  signals)  and  Cl3(y,y)  is  the  fourth-order  cross- 
cumulant  with  elements 

[Ci^y^l^Cumiy^y J,y j,yj);  analogously  is  defined 

the  cross-cumulant  matrix  C3  l(y,y)=(Cl  i(y,y))T . 

Since  the  third-order  cumulants  are  insensitive  to  additive 
Gaussian  noise  and  any  symmetrical  distributed  noise,  it  is 
recommended  to  use  the  third-order  cumulants  rather  than  the 
fourth-order  cumulants. 

2.3.  Cumulant-Based  Filtering 

In  [12],  we  have  shown  that  it  is  possible  based  on  third-order 
cumulants  to  design  an  FIR  filter  that  is  matched  with  the 
desired  EP  potential  signal.  Let  the  ith  signal  *,•(«)  be  an  input 
of  such  matched  filter,  the  filter  impulse  response  is  taken  as  a 


The  proposed  approach  is  developed  as  follows.  We  first  apply 
the  robust  ICA  approach  described  in  Subsection  2.2  to  the 
observed  sensor  signals  for  the  separation  of  the  superimposed 
EP  potentials.  Unfortunately,  although  the  robust  ICA  is 
capable  of  separating  the  independent  components  it  is  not 
capable  of  reducing  additive  noise.  Therefore  the  idea  is  to 
enhance  the  SNR  of  the  ICA  outputs.  This  SNR  enhancement 
is  achieved  by  passing  the  ICA  outputs  through  a  bank  of 
cumulant-based  FIR  filters  of  type  described  in  Subsection  2.3. 
In  this  case  the  output  y,  (n)  is  taken  as  the  input  of  the  fth 

filter.  The  advantage  of  the  filtering  technique  arises  from  the 
fact  that  the  third-order  cumulant  of  the  evoked  potential 
(modeled  as  a  sum  of  damped  sinusoidal  signals)  preserves  the 
evoked  potential  structure  in  addition  to  its  tolerance  to 
additive  Gaussian  noise  and  other  symmetrical  distributed  non 
Gaussian  noise  [12], 

4.  SIMULATION  RESULTS 

To  examine  the  effectiveness  of  the  proposed  approach, 
extensive  simulations  have  been  carried  out.  Due  to  space 
limitations  we  present  here  only  one  illustrative  example.  In 
this  example,  two  simulated  models  for  the  EP  signals  are  given 
in  Figure  1.  Four  observed  evoked  potentials  are  obtained 
using  a  mixing  matrix  of  4x2  random  variables.  The 
maximum  of  absolute  value  of  each  column  is  adjusted  to  unity. 
To  generate  additive  colored  Gaussian  noise  in  order  to 
simulate  ongoing  EEG.  the  spectrum  of  each  sensor  noise  is 
generated  using  a  moving  average  system  of  order  65.  This 
moving  average  is  computed  as  follows.  First,  a  deterministic 
moving  average  is  considered  as  the  coefficients  of  a  Hamming 
window  finite  impulse  response  (FIR)  filter  whose  normalized 
frequency  bandwidth  is  [0.05  0.1].  Next  for  each  sensor  we 
add  to  the  deterministic  impulse  response  additive  Gaussian 
random  variables  so  that  the  variance  of  the  deterministic 
impulse  response  to  this  random  variables  is  10.  Die  signal- 
noise-ration  defined  as  the  power  of  each  observed  signal  to  the 
additive  noise  is  adjusted  to  0.0  and  -10.0  dBs.  The 
independent  component  analysis  procedure  is  first  applied  to 
the  observed  sensor  (superimposed)  signals.  Next,  the 
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cumulant-based  FIR  filter  of  order  32  is  applied  to  the  each 
independent  component  in  order  to  obtain  the  desired 
decomposed  EP  signals. 


0  1000  2000  3000  4000  5000 


0  1000  2000  3000  4000  5000 


no.  of  samples,  n 

Figure  1.  Two  signal  models  for  evoked  potentials. 

Results  when  SNR=0.0  dB  have  shown  the  ability  of  the 
proposed  approach  in  separating  the  two  EP  signal  models.  The 
cumulant-based  filtering  approach  has  also  reduced 
dramatically  the  additive  noise,  which  confirms  the  beneficial 
of  the  filtering  technique  to  improve  the  overall  performance  of 
the  presented  approach.  Illustration  Figures  for  the  SNR=0.0 
dB  have  been  omitted  for  space  limitation.  Figure  2  shows  the 
results  when  the  SNR=  -10.0  dB.  Figure  2(a)  shows  the 
observed  sensor  signals.  Figure  2(b)  shows  the  independent 
components  obtained  using  the  ICA.  Figure  2(c)  shows  the 
filtered  version  of  the  ICA  outputs.  It  is  apparent  the  filtered 
version  of  the  independent  components  represent  the  desired 
EP  signals.  It  can  be  mentioned  that  the  final  cumulant-based 
filter  is  necessary  to  reduce  additive  noise  corrupting  the 
desired  EP  signals. 

Experimental  Results-  To  examine  the  proposed  approach  for 
the  real  world  data,  simultaneous  visual  and  auditory  EPs  from 
a  male  subject  have  been  recorded  using  64  electrodes  EEG 
system.  The  visual  stimulus  was  flash-like  white  circle  in  a 
black  background  that  appears  for  50  msec  with  rate  1  /sec.  The 
auditory  stimulus  was  a  click  with  width  50  msec  modulated  by 
a  1  KHz  single  tone  with  rate  also  1/sec.  The  visual  stimulus 
takes  place  first  for  50  msec  and  then  the  auditory  stimulus. 
The  recorded  data  were  filtered  using  0.05-250  Hz  bandpass 
filter  and  sampled  using  a  sampling  rate  of  2000  Hz.  In  a 
single  trial  we  recorded  2000  samples  (1  sec )  after  the  visual 
stimulus  by  electrodes  02,  01,  0Z,  FC1,  F2  and  FI.  The 
reference  was  the  left  mastoid.  Figure  3(a)  shows  the  raw  data 
for  500  msec  after  the  visual  stimulus.  Figure  3(b)  shows  the 
separated  signals  using  the  ICA.  It  is  obvious  that  two  evoked 
potentials  appear.  The  P100  of  the  visual  and  the  N100  of  the 
auditory  EPs  are  clearly  observed.  The  time  delay  between 
both  peaks  is  about  50  msec.  After  cumulant-based  matched 
filtering  SNR  enhancement  has  been  obtained  as  shown  in 
Figure  3(c). 

5.  CONCLUSION 

A  novel  approach  has  been  described  for  the  extraction  of 
superimposed  single  trial  evoked  potentials  components 


presented  simultaneously  in  relative  or  short  duration  between 
them  by  different  stimuli.  In  this  approach,  we  first  apply  a 
robust  independent  component  analysis  (ICA)  approach  to  the 
observed  sensor  signals  for  the  separation  of  the  superimposed 
EP  components.  Next,  the  desired  EP  components  are  estimated 
by  FIR  matched-filtering  of  noise  corrupting  the  separated 
signals.  The  impulse  response  of  this  FIR  filter  is  computed 
based  on  third-order  cumulants  of  the  filter  input  signal. 
Therefore,  due  to  the  tolerance  of  the  third-order  cumulants  to 
any  symmetrical  distributed  noise  (white  or  colored),  the 
proposed  approach  is  capable  of  extracting  EP  source  signals 
for  very  low  SNR.  Pre-simulation  results  have  confirmed  the 
efficiency  of  the  presented  approach. 
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Figure  2.  Results  for  the  -10.0  dB  SNR  example:  (a),  the  observed  sensor  signals:  (b).  independent  (separated)  signals;  (c).  the  filtered 
version  of  the  independent  signals  in  (b). 
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Figure  3.  Results  of  the  real  world  data:  (a),  the  observed  raw  sensor  signals;  (b).  independent  (separated)  signals;  (c),  the  filtered  version 
of  the  independent  signals  in  (b). 
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ABSTRACT 

To  provide  adequate  information  that  would  assist  surgeons 
in  performing  advanced  refractive  corrections,  it  is  essen¬ 
tial  to  address  the  problem  of  microfluctuations  in  the  eye's 
aberrations  due  to  pulse  and  respiration.  Although  the  ef¬ 
fects  of  fluctuations  in  defocus  are  known  and  well  described, 
very  little  is  reported  on  modelhng  the  fluctuations  in  other 
types  of  aberrations.  We  propose  a  methodology  in  which 
the  dynamics  of  higher  order  aberration  components  are  mod¬ 
elled  by  parametric  AM-FM  signals.  Using  our  modelling 
approach,  the  effects  of  changes  in  these  aberrations  could 
be  predicted  and  studied.  In  particular,  we  model  the  dy¬ 
namics  of  components  related  to  coma  and  spherical  aber¬ 
ration.  We  provide  a  validation  of  the  proposed  modelling 
approach  using  aberration  data  from  the  eyes  of  six  subjects. 

1.  INTRODUCTION 

The  interest  in  customised  refractive  surgery  by  optometrists 
and  ophthalmologists  as  well  as  the  patients  is  growing  sig¬ 
nificantly.  It  is  predicted  that  these  advanced  customised 
refractive  surgeries  will  correct  many  aberrations  of  the  eye, 
providing  vision  levels  that  currently  cannot  be  achieved  [1], 
Such  vision  would  be  limited  only  by  the  resolution  of  the 
retinal  photoreceptors  and  diffraction  due  to  the  pupil  aper¬ 
ture  [2],  However,  these  modem  surgical  procedures  depend 
to  a  great  extent  on  advances  in  eye  measurement  systems, 
such  as  wavefront  sensors.  One  of  the  current  problems  in 
such  systems  is  that  the  dynamics  of  the  eye’s  aberrations, 
due  to  pulse  and  respiration  [3,  4],  are  not  taken  into  ac¬ 
count. 

The  aberrations  of  the  eye  are  usually  described  in  terms 
of  a  scaled  optical  path  difference  between  a  ray  passing 
through  the  optical  system  of  the  eye  at  a  certain  point  in  the 
pupil  and  the  principal  ray.  This  scaled  optical  path  differ¬ 
ence  is  referred  to  as  the  wavefront  aberration  or  wavefront 
error. 

The  wavefront  error  of  an  eye  can  be  measured  with 
a  Hartmann-Shack  sensor  [5],  It  is  an  optical  instrument 


equipped  with  a  laser,  an  array  of  small  lenses,  and  a  CCD 
video  camera.  The  light  reflects  from  the  retina,  passes 
through  the  array  of  lenses,  and  forms  a  grid  image  that  falls 
on  the  CCD.  The  displacements  in  the  grid  image,  from  the 
ideal  square  grid,  are  used  to  calculate  transversal  aberra¬ 
tions  which  are  related  to  the  wavefront  error. 

Wavefront  error  is  often  given  a  functional  form  in  terms 
of  basis  functions.  The  discrete  wavefront  aberration  data, 
denoted  in  polar  coordinates  as  W ( r d,  9d)  can  be  modelled 
by  a  finite  series  of  discrete  basis  functions 

p 

W ( rd ,  9d;  n)  =  zp(n)$p(rd,  9d)  +  ed,  (1) 

p=  1 

where  zp(n),  n  =  0, . . . ,  N—  1  are  the  time- varying  aberra¬ 
tion  coefficients,  $p(rd,9d)  is  the  p-th  discrete  basis  func¬ 
tion  sampled  from  9)  at  discrete  points  d=  1 
and  ed  denotes  the  measurement  noise.  In  some  cases,  such 
sampling  may  require  further  orthogonalisation  using  the 
Gram-Schmidt  procedure.  The  most  popular  basis  func¬ 
tions  amongst  vision  researcher  are  the  Zemike  polynomi¬ 
als  [6,  7],  because  each  of  the  terms  in  the  expansion  can 
be  related  to  a  particular  type  of  aberration.  For  example, 
the  fourth  term  corresponds  to  defocus,  the  fifth  and  sixth  to 
astigmatism,  the  seventh  and  eight  to  coma,  and  the  1 1th  re¬ 
lates  to  spherical  aberration.  In  most  classifications,  the  first 
six  Zemike  terms  are  assigned  as  lower-order  terms  since 
they  can  be  corrected  with  traditional  spectacles. 

In  this  work,  we  focus  on  dynamics  of  higher  order  aber¬ 
rations,  and  in  particular,  of  coma  and  spherical  aberration. 
The  values  of  the  coefficients  associated  with  each  term  vary 
in  time  [4],  due  primarily  to  changes  in  accommodation  (fo¬ 
cusing)  which,  in  turn,  are  affected  by  pulse  and  respira¬ 
tion  [3],  In  order  to  predict  and  study  the  effects  of  dynam¬ 
ics  of  the  eye's  aberrations,  it  is  desired  to  develop  appro¬ 
priate  parametric  models  for  the  dynamics  of  each  of  the 
aberrations. 

The  paper  is  organised  as  follows.  In  the  next  section, 
we  provide  an  overview  of  the  protocol  of  aberration  data 
acquisition.  In  Section  3,  we  describe  the  model  for  the 
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components  of  the  higher  order  aberrations.  This  is  fol¬ 
lowed  by  the  experimental  results  given  in  Section  4. 


2.  DATA  ACQUISITION 

A  custom  made  Hartmann-Shack  sensor  was  used  for  mea¬ 
suring  the  aberrations  of  the  optical  system  of  the  eye.  Six 
subjects  were  used  in  the  study.  For  each  subject,  a  series 
of  grid  images  sampled  at  10  Hz  were  taken  within  a  period 
of  5  seconds.  An  example  of  typical  grid  image  is  shown  in 
Figure  1 .  The  sampling  frequency  was  chosen  well  above 
the  Nyquist  rate  for  signals  that  exist  in  the  cardiopulmonary 
system  [3],  The  limit  of  50  images  was  due  to  the  physical 
capacity  of  computer  memory. 


Fig.  1.  Typical  grid  image  of  the  Hartmann-Shack  sensor. 

The  six  considered  subjects  were  aged  between  20  and 
30  years  and  had  normal  healthy  eyes.  They  were  asked  to 
focus  on  the  instrument's  fixation  target.  All  subjects  were 
optically  corrected  for  lower  order  aberrations  with  specta¬ 
cle  corrections.  Ten  series  of  50  grid  images  where  recorded 
for  each  subject.  There  was  no  blinking  during  the  acquisi¬ 
tion  of  each  series.  The  study  met  the  requirements  of  the 
university  Human  Research  Ethics  Committee. 

For  each  grid  image,  a  centroid  detection  algorithm  was 
used  to  determine  the  transversal  aberrations  and  wavefront 
slopes.  Then,  from  this  information,  the  wavefront  error 
was  derived.  A  series  of  the  first  15  Zemike  polynomials 
was  then  fitted  to  each  wavefront  error  resulting  in  a  50  data 
point  time-series  for  each  type  of  aberrations.  In  the  fol¬ 
lowing,  we  focus  on  higher  order  terms  since  the  first  six 
Zemike  terms  correspond  to  aberrations  that  were  optically 
corrected. 


3.  MODELLING  APPROACH 

Initially,  we  have  performed  a  time,  frequency,  and  time- 
frequency  analysis  of  the  data  for  each  of  the  higher  order 
aberrations.  The  time-frequency  analysis  has  indicated  that 
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Fig.  2.  The  spectrogram  of  the  dynamics  measured  for 
the  eighth  Zemike  coefficient  corresponding  to  horizontal 
coma. 
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Fig.  3.  The  spectrogram  of  the  dynamics  measured  for 
the  eleventh  Zemike  coefficient  corresponding  to  spherical 
aberration. 


each  of  the  Zemike  coefficients  can  be  modelled  as 
L 

zp{n)  =  n  =  0,...,iV  -  1, 

/=i 


where  apj{n),  l  =  denote  aberration  components 

that  are  well-separated  in  frequency.  This  allows  us  to  band¬ 
pass  filter  each  aberration  to  extract  the  components  of  in¬ 
terest  corresponding  to  pulse  and  respiration. 

We  have  also  observed  that  the  spectral  characteristics 
of  the  aberration  components  apj(n)  vary  in  time.  This  is 
demonstrated  in  Figures  2  and  3  which  show  the  spectro¬ 
gram  of  the  horizontal  coma  and  spherical  aberration,  re¬ 
spectively,  for  a  subject  with  a  significant  amount  of  astig¬ 
matism.  The  non-stationary  characteristics  of  the  aberration 
components  are  clearly  evident. 

Motivated  by  these  results,  we  propose  to  model  each 
of  the  aberration  components  as  an  amplitude  modulated- 
frequency  modulated  (AM-FM)  signal.  In  particular,  we 
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consider  the  following  parametric  model: 

aP,i(n )  =  (  9m{n/N)m  \  cos  ,  (2) 

\m= 0  /  \q=0  ) 

n  —  0, ...  ,N  —  1.  We  form  the  analytic  signal 

cpJ(n)  =  apJ(n)  +  jU{apj{n)} 

which,  under  the  assumption  that  the  amplitude  is  low-pass, 
can  be  approximated  by 

9m(n/N)™  |  exp  (j  ^  ^n9^  , 

m= 0  /  \  g= 0  / 

n  =  0  1.  Estimation  of  the  model  parameters  is 

performed  in  two  steps.  First,  we  perform  order  selection 
to  estimate  appropriate  values  of  M  and  Q.  We  have  exper¬ 
imented  with  two  methods  for  model  order  selection.  One 
procedure,  described  in  [8]  is  based  on  a  multiple  hypothesis 
testing.  The  other  procedure  is  based  on  the  bootstrap  [9]. 
As  an  example,  we  show  the  application  of  the  parametric 
modelling  procedure  of  [8]  to  the  spherical  aberration  com¬ 
ponent  shown  in  Figure  3.  A  significance  level  of  1  %  was 
used  in  the  order  selection  procedure  which  gave  M  =  14 
and  <3  =  4.  Using  these  orders,  the  fitted  and  the  values  es¬ 
timated  non-parametrically  of  the  amplitude  and  phase  are 
plotted  against  time  in  Figures  4  and  5. 


Time  (s) 


Fig.  4.  Parametric  modelling  of  the  amplitude  of  the  spher¬ 
ical  aberration  component.  The  non-parametric  estimate 
(solid)  and  the  fitted  values  (dashed)  are  shown. 

Similar  orders  for  the  amplitude  and  phase  were  ob¬ 
tained  with  the  bootstrap  method.  In  this  case,  we  find  es¬ 
timates  go,-..,  9m,  bo,  ■■■  ,t>Q  of  the  amplitude  and  phase 
parameters  using  linear  regressions.  It  can  be  seen  that  the 
amplitude  model  provides  a  close  fit  to  the  actual  ampli¬ 
tude  while  the  phase  model  is  reasonably  close  to  the  actual 
phase.  Clearly,  the  AM-FM  signal  could  be  a  valid  model 
for  the  spherical  aberration  component.  Validation  of  the 
proposed  model  is  given  in  the  next  section. 
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Fig.  5.  Parametric  modelling  of  the  phase  of  the  spherical 
aberration  component.  The  non-parametric  estimate  (solid) 
and  the  fitted  values  (dashed)  are  shown. 

4.  EXPERIMENTAL  RESULTS 

As  mentioned  in  Section  2,  six  subjects  were  used  in  the 
study.  The  best  seven  out  of  10  data  acquisitions  have  been 
selected  for  the  analysis.  In  this  way,  we  have  ensured  that 
the  same  size  of  grid  was  fitted  to  each  grid  image  for  cal¬ 
culation  of  the  wavefront  error.  The  variations  in  the  grid 
size  between  the  images  were  due  to  changes  in  the  pupil 
size.  Three  higher  order  terms  of  the  Zemike  expansion 
were  analysed.  These  were  the  horizontal  coma,  the  ver¬ 
tical  coma,  and  the  spherical  aberration.  As  noted  before, 
the  subjects  were  corrected  for  lower-order  aberrations  by 
wearing  their  usual  spectacles. 

For  each  of  the  terms,  a  low  frequency  single  compo¬ 
nent  was  extracted  by  using  a  1.5  Hz  FIR  low-pass  filter 
of  order  10.  This  was  performed  first  manually  on  a  num¬ 
ber  of  sequences  and  then  automated.  All  the  data  were 
detrended  before  filtering.  Then,  model  order  selection  pro¬ 
cedures  were  used  to  determine  the  parameter  Q  and  M.  It 
has  been  found  that  the  procedure  based  on  the  bootstrap 
is  more  robust  than  the  one  based  on  multiple  hypothesis 
testing.  This  is  mainly  due  to  the  bootstrap  ability  to  find  a 
correct  model  for  short  data  samples  [9].  For  each  model  a 
mean-square  error  (MSE)  of  the  fit  was  calculated  and  then 
averaged  across  the  seven  records. 

It  has  been  observed  that  no  single  model  can  be  fitted 
to  the  components  of  coma  and  the  spherical  aberration.  We 
found  that  the  model  order  of  the  phase  ranges  from  Q  —  3 
to  Q  =  5.  However,  isolated  cases  in  which  both  model 
selection  procedures  were  giving  higher  orders  for  the  phase 
were  observed.  A  wider  range  of  model  order  were  recorded 
for  the  amplitudes  (from  M  =  8to  M  =  20).  In  Tables  1- 
3,  we  show  the  average  standardised  MSE  for  the  amplitude 
and  phase  of  the  components  associated  with  the  coma  the 
spherical  aberration. 

The  results  indicate  that  the  dynamics  of  the  considered 
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Table  1.  Average  values  of  the  standardised  MSE  for  the 
amplitude  (top)  and  phase  (bottom)  of  the  components  cor¬ 
responding  to  vertical  coma. 


Q 

8 

11 

M 

14 

17 

20 

3 

0.0541 

0.0249 

0.0134 

0.0072 

0.0030 

4 

0.0504 

0.0240 

0.0131 

0.0071 

0.0030 

5 

0.0496 

0.0253 

0.0137 

0.0073 

0.0033 

3 

0.0047 

0.0047 

0.0047 

0.0047 

0.0047 

4 

0.0036 

0.0036 

0.0036 

0.0036 

0.0036 

5 

0.0028 

0.0028 

0.0028 

0.0028 

0.0028 

Table  2.  Average  values  of  the  standardised  MSE  for  the 
amplitude  (top)  and  phase  (bottom)  of  the  components  cor¬ 
responding  to  horizontal  coma. 


Q 

8 

11 

M 

14 

17 

20 

3 

0.0632 

0.0300 

0.0145 

0.0060 

0.0024 

4 

0.0528 

0.0279 

0.0135 

0.0059 

0.0024 

5 

0.0509 

0.0253 

0.0121 

0.0060 

0.0026 

3 

0.0037 

0.0037 

0.0037 

0.0037 

0.0037 

4 

0.0029 

0.0029 

0.0029 

0.0029 

0.0029 

5 

0.0024 

0.0024 

0.0024 

0.0024 

0.0024 

Table  3.  Average  values  of  the  standardised  MSE  for  the 
amplitude  (top)  and  phase  (bottom)  of  the  components  cor¬ 
responding  to  spherical  aberration. 


Q 

8 

11 

M 

14 

17 

20 

3 

0.0566 

0.0271 

0.0117 

0.0052 

0.0023 

4 

0.0512 

0.0251 

0.0116 

0.0055 

0.0023 

5 

0.0467 

0.0233 

0.0110 

0.0056 

0.0023 

3 

0.0046 

0.0046 

0.0046 

0.0046 

0.0046 

4 

0.0035 

0.0035 

0.0035 

0.0035 

0.0035 

5 

0.0028 

0.0028 

0.0028 

0.0028 

0.0028 

aberration  components  can  be  well  modelled  by  a  polyno¬ 
mial  phase  signal  of  order  less  than  Q  =  5.  Small  variations 
in  the  average  standardised  MSE  for  the  amplitude  fit  were 
observed.  However,  in  the  interest  of  parsimony,  we  would 
like  to  avoid  choosing  a  model  order  that  results  in  the  num¬ 
ber  of  parameters  being  a  substantial  fraction  of  the  sample 
size.  Therefore,  we  choose  M  =  11  which  results  in  a  fit 
whose  average  standardised  MSE  does  not  exceed  3%.  It 
is  possible  that  choosing  a  different  set  of  basis  functions 


would  result  in  a  lower  order  model. 

5.  CONCLUSIONS 

We  have  addressed  the  problem  of  modelling  the  dynamic 
changes  in  the  components  of  higher  order  aberrations  in 
the  human  eye.  We  proposed  a  parametric  AM-FM  sig¬ 
nal  model  for  these  dynamics.  Although  the  data  samples 
were  short,  we  have  shown  that  the  proposed  modelling  ap¬ 
proach  is  viable  and  could  be  used  for  prediction  and  study 
of  higher  order  aberrations.  Our  methodology  can  be  used 
in  the  design  of  protocols  that  deal  with  the  measurement 
of  aberrations  in  the  human  eye  and  take  into  account  the 
dynamic  characteristics  of  these  aberrations. 
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ABSTRACT 

Methods  for  determining  the  letters  of  our  genetic  code, 
known  as  DNA  sequencing,  currently  depend  on  clever  use 
of  electrophoresis  to  generate  data  sets  indicative  of  the  un¬ 
derlying  sequence.  Typically  the  subsequent  off-line  data 
processing  is  carried  out  using  a  combination  of  heuristic 
methods  with  little  mathematical  rigour.  In  this  paper,  we 
present  a  novel  model  which  is  able  to  accurately  predict  the 
effect  of  the  many  biological  processes  which  are  involved, 
and  moreover,  which  is  usable  on-line.  Off-line  methods 
have  been  hampered  by  the  need  for  processing  in  as  little 
time  as  possible  after  the  data  is  generated;  performing  the 
processing  on-line  has  enabled  a  more  advanced  algorithm 
to  be  used  with  associated  improved  performance.  The  al¬ 
gorithm  is  framed  within  a  Bayesian  probabilistic  frame¬ 
work,  thereby  allowing  representation  of  the  random  nature 
of  the  generative  process,  and  relies  on  new  advances  in  the 
burgeoning  field  of  Sequential  Monte  Carlo  Methods  to  per¬ 
form  non-linear  filtering  and  model  selection  operations. 

1.  INTRODUCTION 

In  the  majority  of  living  organisms,  genetic  information  is 
encoded  using  a  molecule  known  as  Deoxyribonucleic  acid 
(DNA)  which  may,  for  our  purposes,  be  thought  of  as  a  se¬ 
quence  of  chemical  bases  taken  from  a  possible  set  of  four: 
Adenine  (A),  Guanine  (G),  Cytosine  (C),  and  Thymine  (T). 
The  determination  of  the  genetic  code,  known  as  DNA  se¬ 
quencing,  is  important  if  genetic  disease  is  to  be  properly 
understood. 

In  1974,  Sanger  proposed  a  method  for  DNA  sequencing 
which,  with  technical  improvements,  has  since  been  almost 
universally  accepted  [10].  A  simplified  version  of  the  Sanger 
sequencing  process  is  now  presented;  for  a  more  complete 
treatment  see  [4].  Initially,  via  a  process  of  replication  and 
truncation  the  DNA  sequence  of  interest  is  used  to  form  a 
large  population  of  partial  replicas.  Each  replica  is  identical 
to  the  sequence  of  interest  over  a  range  of  bases,  always 
commencing  with  the  first  base  of  the  initial  sequence,  and 
terminating  some  random  distance  down  the  strand.  That 
is,  for  the  sequence  ACGGG  the  population  would  contain  a 
number  of  each  of  the  following:  A,  AC,  ACG,  ACGG,  and 
ACGGG.  Each  fragment  is  fluorescently  labelled  according 
to  its  terminating  base.  Subsequently,  the  entire  population 
is  aligned  at  the  start  of  a  large  rectangular  gel,  and  an 
electric  field  is  applied.  The  fragments  progress  through  the 
gel  at  rates  approximately  inversely  proportional  to  their 


length,  resulting  in  the  various  subpopulations  arriving  at 
the  end  of  the  gel  in  sequence  order.  A  laser  positioned  near 
the  end  of  the  gel  excites  the  fluorescent  labels,  allowing 
an  emission  detector  to  estimate  the  number  of  fragments 
terminated  by  a  given  base  passing  at  each  time  instant. 

After  some  preprocessing,  four  data  sets  are  obtained 
(henceforth,  channels),  corresponding  to  the  variation  of 
fragment  concentration  with  time  for  each  of  the  four  ter¬ 
minating  bases.  This  collection  of  data  is  known  as  an  elec- 
tropherogram  and  is  quite  clearly  indicative  of  the  underly¬ 
ing  base  sequence;  an  example  data  set  is  shown  in  figure  2. 
The  electropherogram  is  a  mixture  of  peaks  in  four  chan¬ 
nels,  with  each  base  in  the  sequence  associated  with  one 
major  peak  in  the  corresponding  channel,  and  three  sec¬ 
ondary  peaks  in  the  remaining  channels  which  result  from 
leakage  effects;  the  peaks  corresponding  to  a  particular  base 
have  common  position  and  shape.  A  range  of  prior  infor¬ 
mation,  mainly  detailing  the  effect  of  base  sequence  on  the 
amplitudes  and  positions  of  the  peaks,  is  available  to  con¬ 
strain  the  problem;  [11]  provides  a  good  review. 

The  current  state-of-the-art  from  an  off-line  signal  pro¬ 
cessing  perspective,  Phred,  is  described  in  [4],  and  uses  a 
combination  of  heuristic,  but  effective,  peak  detection  algo¬ 
rithms.  In  [5],  an  alternative  block-based  algorithm  based 
on  statistical  modelling  of  the  underlying  process  is  pro¬ 
posed  that  outperforms  Phred  on  some  datasets,  although 
at  greater  computational  cost. 

Here,  we  present  a  model  similar  to  that  of  [5,6],  which 
is  capable  of  representing  available  prior  information  about 
the  system.  The  major  improvements  of  the  model  are  that 
it  allows  removal  of  slowly  varying  background  noise,  and  is 
also  able  to  track  nonstationarity  in  the  various  processes. 
In  particular,  variation  in  the  spacing  of  peaks  across  the 
electropherogram  is  properly  modelled,  thereby  reducing 
inferential  ambiguity  in  more  difficult  data  regions.  The 
resulting  algorithm  can  be  run  on-line  and  has  immediate 
application  to  all  data  sets  which  comprise  a  series  of  peaks 
arriving  sequentially  in  time  (as  encountered,  for  example, 
in  some  spectroscopy  applications). 

2.  PROBLEM  FORMULATION 

As  mentioned  in  the  introduction,  each  fragment  popula¬ 
tion  generates  peaks  in  all  four  channels  as  it  passes  the 
end  of  the  gel.  These  are  observed  in  a  combination  of 
slowly-varying  and  approximately  i.i.d.  white  noise,  sug¬ 
gesting  the  following  general  model  for  electropherogram 
data  in  the  four  channels  at  time  n  e  {!,..., N},  yn  4 
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{yn,A,y,l.G,yn.C,yn,T}-- 

k 

y„  =  e„  +  t„  a’u,tlt><  (») 

1  =  1 

where  k  denotes  the  total  number  of  bases  in  the  sequence, 
a,  is  representative  of  the  number  of  fragments  in  the  pop¬ 
ulation  corresponding  to  the  i"‘  base  in  the  sequence,  t„  = 
{tn,A , . . . ,  fn,r}  denotes  a  background  trend  in  the  four  chan¬ 
nels,  u>i  =  {wi,4,w.-,G,w».<7,Wi,T}  is  a  vector  defining  the 
emission  spectrum  in  the  four  channels,  and  cj>i  (n)  defines 
the  peak  shape.  The  uncorrelated  noise  in  the  system  at. 
time  n,  e„  =  {e„.A,  ■  ■  ■ ,  e„.r},  is  assumed  Normally  dis¬ 
tributed,  e„  ~  M  (en|0,  o-^Lix^),  where  I4x4  denotes  the 
identity  matrix. 

Here,  the  peaks  are  assumed  truncated  Gaussian  in  shape 
such  that: 

<t>i  (n)  =  (27ru,:)_3  exp  |__L  („  -p,-)2  j  I  (|tj  -  p,\  <  f) 

where  u,  denotes  the  variance  of  peak  i,  and  p,  denotes  its 
position,  e  is  defined  a-priori  to  be  sufficiently  large  that 
for  the  range  of  possible  variances,  the  truncation  effect  is 
minimal. 

The  base-state  of  the  system  at  base  position  i.  s,  = 
jsi,_2,  ,s;,o;  Si,;  €  B=  {.4,  G,  C,  T}|  is  defined  as  a 

base  triplet,  e.g.,  {A,G,C},  with  the  last  element  corre¬ 
sponding  to  the  ith  base  in  the  sequence,  and  the  first  two 
elements  containing  the  previous  two  bases  in  the  sequence 
(these  are  included  to  account  for  sequence  dependent  ef¬ 
fects,  see  [8]).  The  state  is  assumed  to  evolve  in  a  Markovian 
fashion: 

Si,0  ~  p(s»,o|Sj— 1,0)  i  Sj,_2:-1  =S,-1,0:-1 

where  henceforth,  the  notation  r =  {r;, . . .  ,77}  is  used. 

The  peak  position  prior  attempts  to  incorporate  the 
idea  that  the  mean  peak  spacing  between  bases  i  —  1  and 
i ,  varies  slowly  across  the  electropherogram,  while 

local  peak  jitter,  represented  by  af,  can  be  substantially 
larger.  Moreover,  -yp  (s; )  is  included  to  represent  special 
sequence  dependent  effects. 

Pp,i-\  ~  N  (Pp,i-2<  ) 

p,  ~A/'(p;-i  +7p(s,)/ip,,_i,(7p) 

The  amplitude  process  similarly  has  a  slowly  varying  mean, 
fia  i ,  with  possibly  substantial  local  jitter  represented  by  of : 

»a,i  ~  ’ 

i  €B 

di  ~  Af  {pja  (Si)  Pa  ,i.s;,o  ;  ) 

where  /x0,i,s,  0  denotes  the  mean  expected  from  a  base  of 
type  s;, o,  with  the  effect  of  surrounding  base  sequence  rep¬ 
resented  by  7 a  (s,-). 

The  emission  spectrum  of  a  population  with  local  se¬ 
quence  configuration  s,  is  assumed  to  have  mean  in  channel 
j  G  £>,  p.w,;  (s, ),  and  variance  cr2  j  (s ,): 

w,  ~  Af  (un,j \p„,j  (s i) ,  al,j  (s ,)) 
jets 


The  variances  are  assumed  to  evolve  according  to  a  slowly- 
varying  random  walk  based  on  the  Gamma  distribution: 

Vi  ~  Q  (c,jo,.  ),/?,.  (»,-_,)) 

where  o,.  (v,-i ),  /?,.  (/>, , )  are  chosen  such  that  the  expec¬ 
tation  of  v,  is  equal  to  u,  _i. 

Here,  the  background  trend  is  assumed  to  be  indepen¬ 
dent  between  channels  and  locally  linear  as  described  in  [7]: 

tn  ~  A  (2t„-l 

where  af  controls  the  smoothness  of  the  trend. 

2.1.  State-Space  Form 

At  each  instant,  the  set  of  peaks  affecting  the  data  is  lim¬ 
ited  as  a  result  of  the  truncated  peak  shape.  Defining  T„ 
to  denote  the  set  of  peak  indices  corresponding  to  bases  af¬ 
fecting  the  data  at  time  n,  with  the  first,  future  base  not 
to  affect  the  data  also  included,  the  observation  equation 
becomes: 

y„  =  e„  +  t„  +  aiUi(l>i  (n) 

iei„ 

We  define  9„  =  {a, ,  w, ,p, ,  v, ,  s, } ,  i  G  I„,  to  be  the  system- 
state  at  time  n.  The  number  of  bases  included  in  the 
system-state  at  time  n,  k„  =  dim  1„ ,  varies  with  time  ac¬ 
cording  to  the  number  of  peaks  affecting  the  data.  The 
resulting  model  is  a  Hidden  Markov  Model,  with  system- 

state  evolution  distribution  p  (o„\ •  The 
translation  of  the  priors  of  the  previous  section  to  time- 
indexed  form  is  predominantly  trivial.  In  brief,  there  are 
four  possible  birth  /  death  scenarios  at  each  time  instant,: 

•  The  set  of  peaks  affecting  the  data  remains  the  same: 
On  =  9n  — i,  kn  “  km  —  i . 

•  A  valid  peak  at  time  n  -  1  ceases  to  affect  the  data 
at  time  n,  and  there  are  no  new  peaks:  0„,\-.k„  = 

@n  —  1 , 2 :  A-  „  _  i  i  ku  =  kji  —  1  1. 

•  There  is  a  new  peak  at  time  n,  and  no  other  peaks 
cease  to  affect  the  data:  0„.\:k„  —  {071-1, i  0n,fr„  }i 
k„  =  kn- 1  +  I-  0n.k„  denotes  the  parameters  of  the 
new  base,  as  generated  by  the  evolution  equations  of 
the  previous  section. 

•  There  is  a  new  peak  at  time  «,  and  one  peak  ceases  to 
affect  the  data,  0n,i:k„  =  {0>i-i,2:A„_, ,  0n,k„  }>  kv  = 
kn-\.  0n.k„  denotes  the  parameters  of  the  new  base. 

2.2.  Estimation  Objectives 

In  a  Bayesian  framework  the  posterior  distribution  at  time 
n,  p  (6 1:„,  ki;„|yi:„),  is  used  for  inference,  with  the  expected 
value  of  a  function  of  interest  /  (0u„ ,  ki;„)  under  this  poste¬ 
rior  given  by  f  f  f  (0im,  ki:„)p  (0i:„,  ki:n|yi:„)  d0i:„dki:„ . 
In  most  cases,  including  ours,  the  posterior  is  not  amenable 
to  closed  form  analysis  owing  to  non-linearity  in  the  pa¬ 
rameters,  and  it  is  necessary  to  resort  to  numerical  meth¬ 
ods.  Here,  we  develop  a  numerical  algorithm  to  estimate 
the  posterior  distribution  recursively  in  time  for  on-line  es¬ 
timation. 
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3.  SEQUENTIAL  SIMULATION 

One  means  of  performing  the  required  integration  is  to  rep¬ 
resent  the  posterior  at  each  time  by  a  set  of  weighted  par¬ 
ticles  [3,9]: 

p 

p{d01:n,kun\yi-.n)  =  (i)  «,  (d0i:n,k1:n) 

* — f  al:n'Kl:n 

where  P  denotes  the  number  of  particles,  Wn  ^  denotes  the 
normalised  importance  weight  associated  with  the  particle 

of  value  and  8g(i)  >k(i)  (.)  is  the  delta  function. 

An  algorithm  for  updating  the  particles  as  time  progresses 
is  [3,9]: 

Algorithm  1  -  Sequential  Monte  Carlo  Filter 

For  n  =  2, . . . ,  N 
For  i=l,...,P: 

•  Draw  from  the  importance  distribution 

9n\kn')  ~  7T  (Gn ,  K  j  ,  yi:jv) 

•  Evaluate  the  unnormalised  importance  weights: 


pi 

{yn\0 

i m  /jo) 

Ip  1 

{0\i\k^  10^,1 

Cn  —  1 J 

7T  | 

(0(J\kiP\0 

'n-l.  *n-ll  yi  =  Jv) 

\ 

3.1.  Technical  Details 

As  with  the  proposal  density  of  MCMC  methods,  the  selec¬ 
tion  of  a  suitable  importance  function  is  critically  related  to 
the  performance  of  an  SMC  algorithm.  Here,  an  importance 
distribution  based  on  local  linearisation  of  the  model  (sim¬ 
ilar  to  the  Extended  Kalman  Filter)  is  used  with  the  idea 
being  to  construct  an  importance  distribution  that  approx¬ 
imates  the  true  posterior;  see  [3]  for  similar.  It  is  impor¬ 
tant  to  reinforce  that  this  is  not  equivalent  to  making  the 
assumption  of  linearity,  as  the  importance  distribution  is 
merely  used  to  generate  proposals  which  are  then  weighted 
according  to  the  true  posterior. 

The  resampling  step  aims  to  multiply  or  discard  par¬ 
ticle  trajectories  according  to  how  important  they  are  to 
our  approximation  of  the  posterior  distribution.  When  a 
resampling  step  is  performed  we  use  the  standard  residual 
method  described  in  [9]. 

Our  model  is  defined  on  a  variable  dimension  space, 
with  parameters  fixed  over  moderate  time  intervals.  De¬ 
generacy  of  the  standard  SMC  algorithm  for  such  systems 
is  commonly  known.  Here,  since  the  interval  of  invariance  is 
not  large,  MCMC  transitions  can  be  used  to  help  replenish 
the  particle  set  [2].  That  is,  a  kernel  invariant  to  the  pos¬ 
terior  distribution,  p(0i:n,ki:„|yi;n),  is  applied,  the  idea 
being  that  such  a  transition  can  only  decrease  the  differ¬ 
ence  between  the  current  approximate  distribution  and  the 
invariant  distribution.  The  kernel  used  consists  of  a  fixed  di¬ 
mension  part  based  on  a  modified  Gibbs  sampler  and  a  vari¬ 
able  dimension  part  based  on  a  birth/death  -  split /merge 
Reversible  Jump  kernel;  see  [5]  for  more  details.  In  order  to 
reduce  the  computational  load,  transitions  are  applied  on  a 
subset  of  the  total  parameter  space,  corresponding  to  those 
peaks  centered  relatively  near  the  current  time. 


•  Normalise  the  importance  weights: 


4.  RESULTS 
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•  Optional:  Resample  to  obtain  P  samples  approxi¬ 
mately  distributed  according  top(0 i;n,  ki:n|yi:n).  Set 
the  weights  equal. 

•  Optional:  Apply  a  Markovian  transition  kernel  in¬ 
variant  to  the  posterior  for  each  particle  stream. 

End  For 

End  For 


In  the  above  equations,  neither  p  (OnK  kn  ^O'ff , ,  k'ff ,  j 

or  p  (yn\0n  \  kn^  are  directly  available,  but  may  be  calcu¬ 
lated  by  noting  that  the  model  is  of  linear  Gaussian  state- 
space  form  with  respect  to  the  background  t„  and  also  the 
mean  peak  spacing  and  amplitude  processes,  and  can  there¬ 
fore  be  marginalised  using  the  Kalman  filter  (e.g.  [1]).  Note 
that  the  emission  spectra  can  also  be  marginalised,  although 
the  computational  burden  introduced  can  be  prohibitive. 


In  this  paper,  we  restrict  our  discussion  to  two  datasets,  one 
demonstrating  the  algorithm’s  ability  to  discriminate  be¬ 
tween  multiple  interpretations  when  the  data  is  ambiguous, 
and  the  other  demonstrating  its  ability  to  track  a  slowly- 
varying  baseline  and  changing  mean  peak  spacing. 

The  middle  plot  of  Figure  1  shows  a  typical  example 
of  baseline  tracking,  with  the  predicted  baseline  (obtained 
using  the  Kalman  smoother)  accurately  tracking  the  true 
background.  However,  small  discrepancies  do  occur  on  ac¬ 
count  of  the  inescapable  ambiguity  as  to  what  is  baseline, 
and  what  is  signal;  this  is  not  a  fault  in  the  model  itself. 
The  bottom  plot  of  the  figure  shows  the  true  mean  peak 
spacing  process  against  the  predicted.  It  can  be  seen  that 
the  algorithm  is  performing  well. 

In  Figure  2,  a  dataset  is  shown  where  the  standard  de¬ 
viation  of  the  peak  jitter  process,  ap,  was  set  to  one  third 
of  the  mean,  pp.  The  peaks  overlap  significantly,  and  the 
noise  level  is  more  than  usually  high.  The  data  is  shown 
superimposed  on  the  prediction  corresponding  to  two  par¬ 
ticle  trajectories  obtained  from  the  algorithm.  It  is  visually 
apparent  that  both  particles  provide  a  reasonable  interpre¬ 
tation  of  the  data.  The  priors  on  peak  spacing  and  am¬ 
plitude,  however,  skew  the  inference  in  favour  of  the  more 
parsimonious  8  base  model,  which  corresponds  to  the  truth. 
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Figure  1:  An  example  of  base-line  tracking.  The  top  plot 
shows  a  the  predicted  signal  (dotted)  against  the  data 
(solid).  The  middle  plot  shows  the  true  background  (solid) 
against  the  predicted  background  (dotted).  The  bottom 
plot  shows  the  true  mean  peak  spacing  process  (solid) 
against  the  predicted  process  (dotted) 


The  probabilities  of  the  two  models,  as  given  by  their  fre¬ 
quency  in  the  particle  set,  were  .88  and  .12  for  the  8  and 
9  base  systems  respectively;  other  model  orders  were  not 
supported. 

5.  CONCLUSIONS 

We  have  briefly  introduced  the  DNA  sequencing  problem, 
and  provided  a  meaningful  statistical  framework  in  which  to 
represent  available  information.  This  framework  was  then 
translated  into  one  suitable  for  sequential  estimation  of  the 
posterior  distribution  of  interest  as  it  evolves  in  time.  Re¬ 
sults  of  the  algorithm  are  promising,  with  tracking  of  the 
baseline  and  other  nonstationarity  leading  to  improved  in¬ 
ference.  In  future,  a  more  rigorous  evaluation  of  the  algo¬ 
rithm  against  Phred  will  be  performed. 
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ABSTRACT 


2.  VEP  EXTRACTION  FROM  EEG 


We  describe  a  method  to  extract  single  trial  Visual 
Evoked  Potential  (VEP)  buried  in  ongoing 
Electroencephalogram  (EEG)  activity.  The  common 
method  for  separating  VEP  from  EEG  is  to  use  signal 
averaging.  But  we  use  digital  filters  to  extract  VEP 
assuming  that  VEP  spectra  are  in  the  gamma  band.  As  an 
application,  a  Fuzzy  ARTMAP  (FA)  neural  network 
classifier  with  voting  strategy  is  used  with  this  extracted 
VEP  to  discriminate  alcoholics  from  normal  subjects.  The 
VEP  is  extracted  from  subjects  while  seeing  visuals  of 
Snodgrass  and  Vanderwart  picture  set.  The  high  FA 
classification  of  96.5%  shows  the  validity  of  the  proposed 
method  to  successfully  remove  EEG  contamination. 


1.  INTRODUCTION 


VEP  are  signals  generated  in  the  brain  in  response  to 
visual  stimulus.  Its  analysis  has  become  very  useful  for 
neuropsychological  studies  and  clinical  purposes.  The 
VEP  signal  is  embedded  in  the  ongoing  EEG  with 
additive  noise  causing  difficulty  in  detection  and  analysis 
of  this  signal.  Furthermore,  SNR  of  VEP  to  EEG  is  very 
low,  approximately  -5  dB  [5],  which  complicates  the 
situation  further.  The  traditional  technique  of  solving  this 
problem  is  to  use  ensemble  averaging  [1],  However,  this 
approach  requires  many  trials  and  the  averaged  signal 
might  tend  to  smooth  out  inter-trial  information. 

In  addition,  inter  trial  variation  in  latency  and 
amplitude  might  serve  to  distort  the  VEP  signal.  In  this 
paper,  we  propose  a  method  to  extract  single  trial  VEP 
buried  in  the  spontaneous  EEG  activity  using  digital 
filters  and  use  it  to  discriminate  alcoholics  and  control 
subjects. 


The  extracted  signal  is  first  filtered  to  eliminate  EEG 
signals  since  EEG  signal  spectra  are  in  the  range  of  0  to 
30  Hz.  We  assume  that  the  spectra  of  the  VEP  signals  lie 
in  the  gamma  band  centred  at  40  Hz. 

The  z  transform  of  the  filter  is 


G(s)  =  (l 


rl)2Nd+z~l)N. 


(i) 


The  integer  value  N  can  be  increased  to  reduce  the 
bandwidth  of  the  filter.  After  some  experimental 
simulation,  we  found  that  a  value  of  2  for  N  is  sufficient 
for  our  purpose.  This  band-pass  filter  extracts  spectra 
from  29  to  48  Hz  (using  3  dB  cutoff  and  rounded  to 
nearest  integer)  with  a  sampling  frequency  of  128  Hz.  The 
first  half  of  (1)  acts  as  high  pass  filter  while  the  second 
half  acts  as  low  pass  filter,  which  when  combined  gives  a 
band-pass  filter  with  a  maximum  gain  at  39  Hz,  which  is 
close  to  the  ideal  gamma  band  centre  of  40  Hz.  This  fact 

can  be  shown  by  replacing  z  with  in  (1)  which 

gives  us 


GN(f)  =  \2sin7tfr\~N  (2cos7tfT)N  .  (2) 

As  an  example,  consider  a  VEP  segment  buried 
in  two  EEG  segments  as  shown  in  Figure  2.  The  signal  is 
given  by 


x(ii)  —  xVEP  (n)  +  xEEGl  (n)  +  xEEG2  ( n ) , 

where  XyEE  (/?)  A. yEE  sin(2 7tttfyEp  /  f  s ) , 

XEEG\  («)  =  ^EEG\  sm(2mif EEGl  /  /t)and 
XEEG2  (n)  - 

^EEG2  sin( 2jmfEEC2  /  fs)  ■ 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


249 


We  assume  that  /W:7>=40  Hz.  We  choose //r£C;=  15  Hz  and 
/e£G2=  10  Hz,  arbitrarily.  SNR  value  of  -5  dB  corresponds 
to  AEeci=  A£Eg2=1-8  with  AV£P=1.0,  approximately. 
Figure  1  shows  the  plot  for  x(n)  obtained  by  using  these 
values  for  two  seconds  of  data  with  a  sampling  frequency, 
fs  of  128  Hz.  As  shown  in  the  figure,  assume  that  the  3 
signals  exist  at  different  points  in  time. 

Figure  2  shows  the  data  output  from  the  filter 
with  order  N- 2  for  input  x(n)  given  by  (1).  For  the  filtered 
case,  the  SNRs  of  VEP/EEG1  and  VEP/EEG2  are 
approximately  14  dB  and  30  dB,  respectively.  This 
improved  SNR  values  indicates  the  ability  of  the  digital 
filter  to  remove  EEG  contamination  successfully. 


Figure  1.  VEP  segment  buried  in  two  EEG  segments. 


Figure  2.  Filtered  VEP. 


The  filter  can  be  realised  using  only  adder  and  delay 
circuits  as  shown  in  Figure  3. 


3.  EXPERIMENTAL  METHOD 


VEP  signals  are  extracted  from  20  (10  alcoholic  and  10 
normal)  subjects  with  each  completing  40  trials. 
Measurements  are  taken  for  one  second  from  64 
electrodes  placed  on  the  subject’s  scalp,  which  are 
sampled  at  256  Hz.  The  VEP  signals  are  low  pass  filtered 
using 


z(n)  =  y(n)  +  y(n  - 1) ,  (4) 

where  y(n)  is  the  output  of  filter  discussed  in  Section  2 
and  z(n)  is  the  low  pass  filtered  output.  This  will  remove 
any  frequency  above  128  Hz.  Next,  the  VEP  signals  are 
downsampled  by  half  to  obtain  an  equivalent  sampling 
frequency  of  128  Hz.  This  is  since  we  are  not  interested  in 
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frequencies  higher  than  64  Hz  for  evoked  potential 
analysis.  The  electrode  positions  are  located  at  standard 
sites  (Standard  Electrode  Position  Nomenclature, 
American  Encephalographic  Association).  The  electrode 
positions  are  as  shown  in  Figure  4.  The  VEP  data  is 
extracted  from  subjects  while  being  exposed  to  a  single 
stimulus,  which  are  pictures  of  objects  chosen  from  the 
1980  Snodgrass  and  Vanderwart  picture  set  [6].  These 
pictures  are  common  black  and  white  line  drawings  like 
aeroplane,  hand,  banana,  bicycle,  ball,  etc.  executed 
according  to  a  set  of  rules  that  provide  consistency  of 
pictorial  representation. 

The  extracted  signals  are  separated  from  EEG 
contamination  by  using  the  proposed  digital  filter.  VEP 
signals  with  artefact  contamination  like  eye  blinks  are 
removed  in  the  preprocessing  stage  -  VEP  signals  above 
70p.V  denotes  occurrence  of  eye  blinks. 

Periodogram  (using  Discrete  Fourier  Transform 
method)  with  Welch  averaging  [7]  is  used  to  obtain  the 
power  spectral  density  (PSD)  of  the  extracted  VEP.  The 
Welch  method  is  applied  with  50%  overlap. 

The  peak  PSD  from  each  channel  is  concanated 
into  a  single  feature  array  to  be  used  by  a  Fuzzy 
ARTMAP  (FA)  classifier  to  classify  these  VEP  patterns 
as  belonging  to  the  alcoholic  subjects  class  or  normal 
subjects  class.  Fast  learning  method  is  employed  to  speed 
up  training  FA  and  voting  strategy  run  with  10 
simulations  are  used  to  improve  FA  classification  [4].  FA 
vigilance  parameter  is  varied  from  0  to  0.9  in  steps  of  0.1. 


Figure  4.  64  channel  electrode  system. 


Two  experiments  are  simulated  in  the 
experimental  study.  First,  the  VEP  signals  are  filtered  and 
used  in  FA  classification  while  the  second  classification 
experiment  uses  VEP  data  without  filtering.  This 
procedure  is  to  show  the  advantage  of  using  the  filter  to 
remove  overlapping  EEG  from  VEP. 


Fuzzy  AFIT  a 

Figure  5.  Fuzzy  ARTMAP  structure. 


4.  RESULTS 


Table  1  shows  the  results  of  the  experimental  study.  It  can 
be  seen  that  FA  classification  using  the  filtered  data  is 
higher  than  the  case  of  without  filtering.  This  is  since 
VEP  signals  are  contaminated  with  EEG  and  the  filtering 
process  successfully  removes  this  contamination  thereby 
allowing  the  visual  stimulus  to  be  represented  in  the  VEP 
signal.  In  general,  it  can  be  seen  that  a  better  classification 
is  obtained  with  a  higher  vigilance  parameter  with  a 
maximum  classification  of  96.5%  for  vigilance  parameter 
value  of  0.9. 


Table  1.  FA  classification  results. 


VEP  classification 

Vigilance 

parameter 

With 

filter 

Without 

filtering 

0 

87.25 

73.25 

0.1 

87.75 

72.25 

0.2 

88.25 

73.00 

0.3 

87.00 

67.75 

0.4 

86.75 

71.75 

0.5 

86.50 

77.00 

0.6 

88.00 

77.25 

0.7 

88.50 

79.00 

0.8 

90.25 

79.50 

0.9 

96.50 

76.25 

5.  CONCLUSION 


This  paper  has  proposed  a  method  of  detecting  and 
extracting  single  trial  VEP  signals  buried  in  EEG  and 
noise  using  digital  filters.  FA  classification  using  PSD  of 
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the  extracted  VEP  data  obtained  from  subjects  during  the 
presentation  of  visuals  from  Snodgrass  and  Vanderwart 
picture  set  gives  96.5%  accuracy  in  differentiating 
alcoholics  from  control  subjects.  The  high  classification 
shows  that  the  proposed  method  is  advantageous  in  single 
trial  VEP  detection  and  classification. 
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Abstract:  Ventricular  fibrillation  (VF)  is  one  of  the  most  serious 
malignant  arrhymias  usually  resulted  from  immediate 
degeneration  of  ventricular  tachycardia  (VT).  In  order  to 
analyze  the  nonlinear  dynamics  of  cardiac  micro-mechanism 
under  VT  and  VF  rhythm,  at  the  cellular  level,  myocardial  cell 
action  potentials  (APs)  are  investigated  under  different  rhythm, 
normal  sinus  rhythm,  VT  and  VF.  On  the  basis  of  nonlinear 
chaotic  theory  and  symbolic  dynamics,  we  forwarded  some  new 
definitions,  complexity  rate,  etc,  and  obtained  some  useful 
properties  for  cellular  electrophysiological  analysis.  The  results 
of  the  experiments  and  computation  show  that  the  myocardial 
cellular  signals  under  VT  and  VF  rhythm  are  different  kinds  of 
chaotic  signals  that  the  cardiac  chaos  attractor  under  VF  is 
higher  than  that  under  VT.  The  analytical  complexity  theory  has 
a  good  promising  in  the  clinical  application. 

I.  INTRODUCTION 

Study  on  the  pathology  and  electrophysiology  of  the 
normal  and  abnormal  heart  rhythms  is  of  great  important 
clinical  significance[l][2][3].  Currently,  in  this  field  the 
application  of  linear  and  nonlinear  theories  focus  on  the  analysis 
of  body  surface  ECGs  and  heart  rate  variability.  They  are  all  the 
studies  conducted  on  the  whole  heart  level.  However,  at  the 
cellular  level  to  study  myocardial  cell  action  potentials  (APs) 
during  different  rhythm  is  also  necessary,  since  Aps  are  the 
bases  of  the  ECG.  Through  the  analysis  of  Aps,  we  can  more 
easily  understand  the  electrophysiological  mechanism  of  VT 
and  VF.  With  the  development  of  physiological  experiment,  it  is 
realized  that  the  ion  channels  of  myocardial  cell  membrane 
possess  nonlinearity.  Aps  reflect  the  nonlinear  interaction  of  ion 
channels  and  contain  all  kinds  of  information  of  cell 
electrophysiology.  Because  of  the  exchange  of  material  and 
energy  with  external  environment  through  ion  exchange  and 
propagation  of  excitation  between  cell  as  well  as  the  mechanism 
of  the  cell  itself,  various  kinds  of  mechanical,  electrical,  thermal 
and  chemical  coupling  exist  among  all  the  parts  inside  and 
outside  the  cells.  Therefore,  a  myocardial  cell  can  be  treated  as  a 
nonlinear  system,  where  chaos  can  occur  under  certain 
condition. 

Quantitative  chaos  analysis  has  been  a  very  effective 
method  in  the  nonlinear  biosignal  processing  [4][5][6][7].  But  in 
recent  study,  some  researchers  have  pointed  out  there  appeared 
some  application  limitation  in  the  traditional  chaotic  signal 
analysis  method,  such  as  Grassberger  and  Procaccia  (GP) 
algorithm  and  Lyapunov  exponent,  because  of  the  Takens’ 
embedded  theory  which  was  constructed  for  the  chaotic  analysis 
of  low-dimensional  attractors[8][9][10].  Aiming  at  these 
limitations,  this  paper  forwards  some  novel  quantitative 
methods,  such  as  complexity  rate,  complexity  dispersity  and 
complexity  saturation,  based  on  symbolic  dynamics  and 
nonlinear  theory.  To  some  extent,  the  extracted  complexity 
information  can  reflect  mechanisms  of  the  body-fluid  control 
and  neurological  adjustment.  In  our  study,  myocardial  cell  Aps 
were  measured  by  floating  microelectrode  technique  from 
isolated  rabbit  heart  during  ventricular  tachycardia  and 
fibrillation. 


Hopkins  School  of  Medicine,  Baltimore,  MD  21205  USA 

II.  THEORY  AND  METHODOLOGY 

2.1  L-Z  complexity 

From  the  viewpoint  of  dynamics,  steady  state  and  periodic 
motion  is  in  order  and  not  complexed,  but  the  dynamical  system 
becomes  complex  when  it  enters  chaotic  state[l  1  ].  For  a  system, 
it  is  important  to  characterize  its  complexity 
quantitatively!  1 1][12][13][14][15],  It  can  be  estimated  from  the 
measurable  I -dimensional  signal  reflecting  the  comprehensive 
interactions  of  components  of  the  multi-dimensional  system. 
From  a  given  finite  sequence  S=s,s2  —s,„  Lempel  and  Ziv 
proposed  one  useful  complexity  measure  c(n)  and  offered  the 
related  mathematical  definitions  and  deductions  in  detailfll], 
C(n)  can  characterize  development  of  spatiotemporal  patterns. 

2.2  complexity  rate  and  dispersity 

According  to  our  experiments  and  simulations,  limited 
system  information  can  be  extracted  from  the  coarse-grain 
symbol  dynamic  sequences  and  speed  biosignal  information 
cannot  be  obtained  with  only  complexity  measure  computation. 
Clinic  workers  hope  to  get  the  accurate  pathological  reason  in 
the  abnormal  cardiac  signal  analysis,  for  instance  the  body  fluid 
and  nerve  control  interdiction,  apart  from  the  abnormal  cardiac 
signal  features  extraction.  Complexity  rate  information 
extracted  from  the  ECG  data  records  can  construct  a  correct  and 
reasonable  relationship  between  biosystem  pathology  and 
human  cognition  interface.  On  the  basis  of  established 
complexity  measure  and  complexity  method  of  system  features 
extraction  [15][16][17],  we  put  forward  a  new  method  for 
complexity  study  —  the  symbol  dynamic  system  complexity 
rate  and  dispersity  information.  The  inner  reason  of 
nonstationary  biosystem  dynamic  change  can  be  excavated  with 
the  help  of  this  method[18]. 

Given  a  dynamic  system  time  sequence  X={x,,  x2,  •••  , 
X;,  •••  },  there  exists  subsequence  L,, 

Lj={x„  x2,  •••  ,Xj  },  in  which  i=l,  2,  •••  ,n; 

Utilizing  the  L-Z  complexity  measure,  corresponding 
complexity  can  be  computed  for  each  subsequence  Q;  suppose 
L,  is  correspond  to  complexity  C;. 

2.2.1  Definition  (Finite  sequence  complexity  sequence) 

Suppose  sequence  X={x,,  x2,  •••  ,  x,,  •••  },  there  exists 
subsequence  Li;  Lj={x„  x2,  •••  pq  },  in  which  i=l,  2,  ,  n; 

we  difine  cx={c,  ,c2  ,  —,cn}  as  the  corresponding  complexity 
measure  sequence  of  the  sequence  X.n,  in  which  c:  is  the 
sequence  complexity  of  the  L)?  Xn  is  the  finite  time  sequence  of 
X. 

2.2.2  Definition  (Time  sequence  complexity  rate) 

Given  a  finite  time  sequence  X={x,,  x2,  •••  ,  x,,  •••  },  the 
corresponding  finite  complexity  sequence  is  cx={c,  ,c,  ,  •”,cn}, 
we  define  complexity  as  follows: 

c»,  ~cnj 

cc(n)= - -  (1) 

Hi-Hj 

in  which  nrnj  must  at  least  larger  than  Takens’  embedding 
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dimension[8].  We  can  denote  the  complexity  rate  as: 
cc(n)=diff(n)  ;  intituled  as  time  sequence  instantaneous 
complexity.  cc(n)  reflects  the  speed  of  the  complexity  change  of 
the  definite  time  sequence. 

According  to  this  definition,  the  complexity  rate  of  the 
whole  time  sequence  X(n)  can  be  calculated  from  slope  rate  of 
the  sequence  fitting  polynomial: 

cc[x(n)\  =  DIFF[x(n )]  (2) 

2.2.3  Definition  (Average  complexity) 

Given  a  limited  dynamic  time  sequence  X={x,,  x:,  •••  , 
x„},  in  which  n<°°;thc  corresponding  complexity  sequence  is 
cx={c,  ,c2,  c„}, 

Then: 


cv  = 


n  tr 


2.2.4  Definition  (Complexity  dispersity) 

I  Cy  -C,.|| 


FL  =  - 


<yf 


(3) 


(4) 


in  which,  FL  is  the  CD  (complexity  dispersity)  of  a  given  time 
sequence  {x„};  CN  is  the  complexity  measure  of  the  sequence; 
Cr  is  the  complexity  measure  of  surrogate  data  time  sequence 

{  x'  } ;  <J N  is  the  mean  square  deviation  of  surrogate  data  time 


sequence  {  x'  }  (Here  we  employ  Gaussian  surrogate  method  in 
our  complexity  dispersity  computation[19][20][21]).  The 
symbol  I*!  stands  for  Euclidean  distance  in  the  algorithm  of 
complexity  measure. 

From  the  complexity  dispersity  definition,  we  can  obtain 
the  following.  If  the  given  time  sequence  {x,,}  or  the  main  part 
of  it  is  a  stochastic  process,  the  complexity  measure  of  surrogate 
time  sequence  is  proportional  to  that  of  the  original  time 
sequence,  but  the  corresponding  mean  square  deviation  is  bigger 
and  that  results  in  less  complexity  dispersity.  On  the  other  hand, 
if  the  given  time  sequence  is  deterministic  chaotic  signal,  the 
corresponding  complexity  dispersity  criterion  is  bigger[2l]. 


2.3  complexity  saturation 

When  a  deterministic  periodic  dynamic  system  enters  into 
chaos,  then  random,  we  observe  that  the  complexity  of  periodic 
procedure  is  definite  and  not  allied  to  sampling  start  point. 
When  the  dynamic  system  is  a  low-dimension  chaotic  system, 
the  corresponding  complexity  is  a  definite  value  and  there  is 
saturation  phenomenon  during  the  complexity  computation. 
When  the  dynamic  system  is  in  high-dimension  chaos,  the 
corresponding  complexity  increases  with  the  length  of  the 
procedure  and  the  complexity  saturation  phenomenon  is  not 
easy  to  observe  in  spit  of  the  existence.  When  dynamic  system 
diverges  into  random,  there  exist  no  saturation  phenomenon  and 
the  complexity  is  proportional  to  the  length  of  the  sequence.  The 
corresponding  complexity  rate  of  the  high-dimension  chaotic 
system  is  higher  than  that  of  low-dimension  chaotic  system.  It 
can  be  concluded  that  complexity  saturation  is  a  significant 
parameter  in  the  research  of  periodic,  chaotic  and  random 
procedure. 


III.  EXPERIMENT  RESULTS 


epicardium  were  used  to  pacing  the  heart  or  for  inducing 
fibrillation  by  60  Hz  AC  stimulation.  Another  separate  pair, 
consisting  of  a  large  area  patch  electrode  and  an  intraventricular 
catheter,  was  utilized  for  delivering  defibrillation  shocks.  To 
record  the  Aps  from  single  cell,  a  floating  microclectrodc 
technique  was  uscd[23]:  the  electrode  was  constructed  of  a  thin, 
coiled  silver  wire  and  a  standard  capillary  glass  micropipctte. 
By  using  a  micromanipulator,  as  Fig  1,  the  microclectrodc  was 
lowered  onto  the  heart  surface  until  it  achieved  cell  penetration 
due  to  natural  gravitational  force.  The  silver  wire  maintained  the 
impalement  despite  heart  motion  and  several  minutes  of 
continuous  recordings  were  obtained  in  this  manner.  After 
filtered  and  amplified,  the  experimental  data  were  collected  on 
tape  by  a  wide  band  FM  recorder,  and  then  digitized  (200 
samples  per  second)  on  a  personal  computer  using  a  data 
acquisition  system.  The  single  cell  Aps  were  obtained  during  the 
condition  of  NSR(norma!  sinus  rhythm),  VT(vcntricular 
tachycardia)  and  VF(ventricular  fibrillation).  VT  and  VF  were 
induced  by  60  Hz  AC  electrical  stimulation.  During  VF  in  vivo, 
the  heart  loses  pumping  function  and  the  blood  pressure  in  the 
coronary  arteries  is  approximately  zero.  To  stimulate  the 
conditions  in  the  isolated  heart  preparation,  perfusion  was 
stopped  after  the  induction  of  VF.  The  action  potential 
waveforms  of  the  experiment  were  shown  in  the  Fig  2. 


Fig  1.  Schematic  of  the  isolated  animal  heart  experiment 


Fig  2.  Time  domain  Waveform  of  myocardial 
cellular  Aps  under  the  rhythm  of  NSR,  VT  and  VF 


1.  Data  acquisition 

The  experiments  were  accomplished  on  isolated  heart 
experimental  set-up  in  Johns  Hopkins  School  of  Medicine[22]. 
Large  New  Zealand  rabbits  (4-6  kg)  were  anesthetized  and  their 
hearts  were  rapidly  exercised  through  median  sternotomy  and 
perfused  in  a  Langendorff  style  preparation.  During  normal 
sinus  rhythm,  peak  of  retrograde  perfusion  pressure  was 
maintained  at  80  mm  Hg.  A  pair  of  electrodes  attached  to  the 


2.  Complexity  comparison  of  VT  and  VF 

Figure  3  is  the  complexity  comparison  map  of  VT  and  VF, 
in  which  both  have  15  groups  record  data.  Every  item  data 
length  of  each  record  is  2000  and  the  signal-sampling  rate  is 
200Hz.  In  the  complexity  comparison  experiments,  every  group 
of  Aps  data  under  NSR,  VT  and  VF  rhythm  are  under  similar 
condition  to  validate  the  complexity  comparison.  Figure  4.  is  the 
average  complexity  comparison  of  NSR,  VT  and  VF. 
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Data  sequence 


VF  VT  —A—  NSR 


Fig  3.  NSR,  VT  and  VF  complexity  comparison 


ElCompleixity  comparison 


Fig  4.  Average  complexity  comparison  of  AP 

3.  Complexity  rate  comparison  of  VT  and  VF 

In  our  complexity  computation  and  detection  of  VT/VF, 
we  have  utilized  all  experimental  records  (30  group  data).  Each 
individual  data  sample  included  one  VF  signal,  one  VT  and  one 
random.  In  our  experiments  and  simulations,  the  random  signal 
was  a  Gaussian  distribution  data  generated  by  computer 
simulation.  (See  table  1  and  table  2.) 


Table  1 .  complexity  information  analysisC  2000  sample  data ,  two 
effective  digit ) _ 


Investigation  item 

Maximum 

complexity 

Average 

complexity 

Complexity 

saturation 

Group 

one 

Random 

signal 

98 

53 

No 

VF 

rhythm 

49 

24 

No 

ECG 

VT 

rhythm 

14 

7 

Yes 

Group 

two 

ECG 

Random 

signal 

95 

49 

No 

VF 

rhythm 

45 

22 

No 

ECG 

VT 

rhythm 

21 

13 

Yes 

Group 

three 

ECG 

Random 

signal 

97 

45 

No 

VF 

rhythm 

58 

29 

No 

ECG 

VT 

rhythm 

19 

11 

Yes 

ECG 

Table  2.  Complexity  rate  analysis  of  the  three  myocardial  cell 


AP  signals  (2000  sample  data,  three  effective  digit) 

Investigation  item 

Random 

signal 

VF  rhythm 
signal 

VT  rhythm 
signal 

Group  one 

0.935 

0.525 

0.225 

Group  two 

0.932 

0.542 

0.267 

Group  three 

0.962 

0.486 

0.279 

Average  complexity 

0.943 

0.518 

0.257 

rate 


(We  choose  linear  fit  for  the  complexity  computation.) 


4.  Complexity  dispersity  comparison  of  VT  and  VF 


Table  3  Complexity  dispersity  analysis  of  myocardial  cell  AP 
signal  under  VT  rhythm 

Data 

Original 

data 

complexity 

Surrogate 

data 

complexity 

Surrogate 
data  mean 
square 
deviation 

Complexity 

dispersity 

Data  1(VT1) 

49 

87 

3.56 

10.67 

Data  2(VT2) 

55 

89 

3.49 

9.74 

Data  3(VT3) 

47 

78 

3.45 

8.99 

Data  4(VT4) 

59 

84 

3.55 

9.86 

Data  5(Gnoisel) 

192 

193 

0.99 

1.01 

VT  average 

52.5 

84.5 

3.51 

9.82 

Table  4  Complexity  dispersity  analysis  of  myocardial  cell  AP 
signal  under  VF  rhythm _ 


data 

Original 

data 

complexity 

Surrogate 

data 

complexity 

Surrogate 
data  mean 
square 
deviation 

Complexity 

dispersity 

Data  1(VF1) 

105 

122 

5.25 

3.24 

Data  2(VF2) 

101 

125 

5.58 

4.30 

Data  3(VF3) 

104 

126 

4.98 

4.42 

Data  4(VF4) 

107 

118 

5.39 

2.04 

Data  5(Gnoise2) 

194 

195 

0.99 

1.01 

VF  average 

104.3 

122.8 

5.30 

3.50 

Table  3  and  Table  4  are  complexity  dispersity  information 
of  myocardial  cell  AP  signals  under  VT  and  VF  rhythm.  The 
complexity  and  mean  square  deviation  of  VF  signal  in  our 
experiments  are  higher  than  those  of  VT.  On  the  standardized 
basis  of  stochastic  analysis,  the  complexity  dispersity 
constructed  from  chaotic  dynamics  and  stochastic  analysis 
exhibits  better  stability  and  practicability  than  GP  algorithm.  In 
the  two  tables,  the  complexity  dispersity  of  Gauss  stochastic 
process  is  about  1  and  Gauss  process  cannot  be  a  chaotic 
process  but  a  random.  Through  electrophysiological 
experiments  and  data  unification,  we  can  confirm  that  the 
myocardial  cell  AP  signals  under  VT  and  VF  rhythm  are  chaotic. 
To  our  satisfaction,  the  complexity  dispersity,  to  some  extent, 
can  be  utilized  quantitatively  to  identify  low-dimension 
deterministic  chaos  from  high-dimension  deterministic  chaos  as 
well  as  cardiac  chaotic  qualitative  analysis. 

IV.  DISCUSSION 

From  the  experiments  and  computation  above,  we  can  see 
that  a  myocardial  cell  can  be  treated  as  a  nonlinear  chaotic 
system.  With  the  alteration  of  conditions,  the  changes  of 
nonlinear  characteristics  of  the  ion  channels  cause  the 
dynamical  behavior  of  the  myocardial  cell  to  change 
correspondingly,  which  is  the  basis  of  the  change  of  dynamical 
behavior  of  the  whole  heart.  This  suggests  that  we  should 
combine  the  studies  at  the  cellular  level  with  the  ones  on  the 
whole  heart  level  in  order  to  acquire  better  understanding  of  the 
characteristics  and  mechanism  of  various  rhythms.  We  can 
model  the  relationship  between  the  activities  of  the  epicardial 
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cell  and  the  tissue  or  the  whole  heart  for  detecting  different 
cardiac  arrhythmia  or  developing  new  pacing  mode  for  the 
patient  with  dysfunctions  in  conduction  system.  Moreover,  the 
development  of  nonlinear  dynamical  information  such  as 
complexity  rate  and  dispersity  from  1 -dimensional  action 
potentials  for  studying  the  myocardial  cell  electrophysiology, 
which  is  of  some  instructiveness  for  bioclectrical  modeling  of 
the  ion  channels.  These  dynamical  indicates  can  serve  as  clinical 
useful  parameters  for  characterizing  different  cardiac  rhythms, 
our  study  suggests  that  the  ventricular  tachycardia  caused  by 
some  reasons  might  drive  the  myocardial  cell  into  quasiperiodic 
motion  from  normal  periodic  motion  and  finally  into  chaos, 
however,  chaotic  rhythm  can  be  terminated  by  some  measures, 
such  as  drug  administration  and  ICD  (Implantable  cardioverter 
defibrillator).  We  can  further  our  investigation  for  the 
controllability  of  cardiac  rhythms  based  on  chaos  and  symbolic 
theory. 

V.  CONCLUSION 

Based  on  symbolic  dynamics  and  nonlinear  theory,  this 
paper  forwards  definitions  of  complexity  sequence,  sequence 
complexity  rate,  complexity  dispersity  and  complexity 
saturation.  In  our  experiments  and  computation  of  action 
potential  signal,  we  observed  credible  and  satisfactory  results  in 
analysis  of  myocardial  cell  AP  signal  under  the  rhythm  of  VT 
and  VF.  With  complexity  rate  and  dispersity,  we  can  not  only 
understand  the  cellular  mechanism  of  the  serious  rhythm  of  VT 
and  VF  but  also  construct  a  basic  link  between  physiology  and 
detection  parameters.  Our  analytical  complexity  information 
provides  an  effective  and  reliable  method  for  analysis,  detection 
of  cardiac  pathology. 
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ABSTRACT 

In  the  basic  and  clinical  research  on  brain’s  response 
to  injury,  electrical  signals  from  the  brain,  namely 
EEG,  is  useful  in  providing  an  immediate  signaling  of 
the  dysfunction.  However,  EEG  signals  have  proven 
to  be  difficult  to  analyze  and  interpret  due  it  its 
complex  signal  characteristic  There  is  a  critical  need 
for  developing  robust  and  reliable  measures  that  can 
be  correlated  with  injury  as  well  as  survival.  In  this 
paper,  we  address  a  unique  problem  of  characterizing 
quantitatively  the  electrical  measures  of  brain  injury 
for  analysis  of  brain  activity  in  animal  and  human 
subjects.  The  key  objective  is  to  model  EEG  spectra 
and  its  features  so  that  signaling  changes  due  to  injury 
can  be  discovered.  We  do  so  with  the  method  of 
autoregressive  modeling  and  dominant  frequency 
analysis.  The  trends  in  the  electrical  signaling 
following  injury  and  following  resuscitation  are 
modeled  using  the  cepstral  distance  derived  from  the 
AR  model. 


INTRODUCTION 

About  70,000  persons  per  year  are 
successfully  resuscitated  after  cardiac  arrest  in  both 
hospital  and  community  settings  in  the  United 
States.  Around  60%  of  those  persons  subsequently 
die  because  of  extensive  brain  injury  and  only  3  to 
10%  resume  their  former  life-style  [3],  The 
neurological  recovery  after  successful  resuscitation 
from  cardiac  arrest  largely  influences  the  morbidity 
and  mortality  of  these  patients  [4].  Despite  the 
magnitude  of  the  problem,  only  clinical 
neurological  assessment  is  used  to  monitor  brain 
injury  and  no  real  time  objective  methods  to  detect 
and  monitor  brain  injury  exist  at  present  time 

EEG  is  a  sensitive  but  nonspecific  measure 
of  brain  function  [15]  and  its  use  in  cerebrovascular 
diseases  is  limited  [5].  EEG  has  been  used  for 
prognostication  in  after  resuscitation  from  cardiac 
arrest  with  some  success].  In  most  of  the 
applications,  the  EEG  recording  results  in  long 
traces  with  marked  inter-observer  variability  [6], 
QEEG  has  been  used  to  reduce  these  difficulties. 
This  technique  has  been  confined  to  feature 


analysis,  conventional  power  spectrum  analysis, 
parametric  description  of  EEG  through  linear  auto 
regressive  (AR)  modeling,  or  frequency  analysis 
based  on  clinically  accepted  8,  0,  a,  and  [i  waves 
[9].  Presently,  the  use  of  qEEG  has  very  limited 
clinical  utility,  thus  it  is  used  mainly  as  an 
investigational  tool.  Power  spectrum  has  been 
widely  used  to  characterize  EEG  via  the  Fast 
Fourier  Transform  (FFT)  and  other  power  spectrum 
density  estimation  techniques.  Linear  AR  modeling 
[7,8]  has  also  been  used  and  was  able  to  reduce 
experimental  data  while  preserving  important 
features  such  as  time-varying  changes,  dominant 
frequency  components,  as  well  as  their  amplitudes 
and  powers. 

We  utilize  AR  modeling  to  investigate  the 
transient  properties  of  on  going  EEG,  which  can 
vital  for  the  early  detection  of  brain  injuries.  The 
brain’s  response  to  graded  injury  will  be  studied 
using  quantitative  characterizations  of  EEG  signals 
based  on  distance  measures,  which  are  methods  of 
differentiating  spectra  on  the  basis  of  a  single 
continuous  criterion.  Our  first  goal  is  to  show  that 
the  methods  of  distance  measurement  analysis 
determine  significant  variation  in  EEG,  which 
reflects  alteration  in  cerebral  function  during  injury. 
The  ability  of  the  spectrum-based  EEG  distance 
measures  Spectral  Distance  (SD)  and  Cepstral 
Distance  (CD),  are  compared  as  they  detect  cerebral 
dysfunction  after  cardiac  arrest.  Our  second  goal  is 
to  determine  if  the  distance  measures  are  useful  in 
providing  prognosis  of  neurological  recovery  after 
global  asphyxic  injury. 

METHODS 

Q-EEG  analysis  -  The  specific  goal  of  our 
preliminary  work  was  to  develop  novel  tools  and 
technologies  to  analyze  the  brain’s  electrical 
signaling,  as  measured  by  Q-EEG.  We  utilize 
autoregressive  (AR)  modeling  to  investigate  the 
transient  properties  of  on  going  EEG  signal 
characteristics.  Such  transient  changes  in  EEG  are 
vital  for  the  early  detection  of  brain  injuries.  In  this 
model,  the  EEG  is  treated  as  a  realization  of  the 
time-series  x[ k~\  generated  as  follows: 
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*[&]  =  'Yja,x[k  -  /]  +  e[k ] 

i=] 

where  P  is  the  order  of  the  AR  process  and  e\k ]  is 
the  unpredictable  part  of  x[ft],  The  reason  for 
choosing  the  AR  models  are  that  AR  processes  are 
capable  of  approximating  the  EEG  spectrum.  With 
the  knowledge  of  AR  model  parameters  which 
minimizes  £{e2[&]},  the  EEG  spectrum  magnitude 


model  order,  and  w(k)  is  the  error  in  prediction 

[12] 

2.  Spectrum  generation,  P(:),  from  the 
autoregressive  coefficients 
P(:)  =  (72  /|l  +  fl(l)r_1  +a( 2):~2  +...+  a(p)z~p\ 
where  a2=E{w(n)2}  is  the  variance  of  the 
input  noise.  Peaks  in  spectrum  show 
dominant  frequencies. 


was  estimated  using 

$,(«*)= 


i  -T 


-jQn 


n=l 


The  mathematical  scheme  to  build  the  AR  model 
and  to  calculate  the  cepstral  distance  (CD)  metric  is 
illustrated  in  Fig.  1  [2,11].  Preliminary  studies 
reported  that  the  “distance”  metric  rigorously 


3. 


The  power  in  dominant  peaks  is  given  by 
the  area  under  the  peaks  in  the  power 
spectrum  [13]  based  on  the  residues 


or 


Power  (a)  )  ■ 

dominant 


2  Re  lResidue  of  P(z)/z  at  z  =  exp  (jo).  )  f 

l  '  r J  dominant  1 


We  examine  the  differences  in  the  recovery 
pattern  of  all  three  dominant  frequency  bands 


Fig.  1:  Flowchart  illustrating  methodology  for  finding  the  cepstral  coefficients  of  a  time  series. 

Low  pass  filtered  EEG  fast-Fourier  transformed  (FFT)  (1).  Afterwards  the  log  of  the  FFT  output  is  made 
and  then  the  inverse  FFT  is  taken.  The  resulting  output  are  the  cepstral  coefficients.  To  find  the  cepstral 
distance  or  CD  between  two  time  series,  the  sum  of  magnitude  squared  differences  between  respective 
coefficients  is  made. 


CD  method  has  been  shown  to  be  more  accurate  in 
correlating  the  electrical  signaling  response  to  the 
Neuro  Deficit  Scoring  (NDS).  Using  a  stable  EEG 
time  series  and  the  correct  model  order,  dominant 
frequency  analysis  allows  a  spectral  breakdown  of 
the  EEG  from  the  data  itself.  Thus,  it  is  unique  to 
each  patient.  EEG  may  not  cluster  into  traditional 
alpha,  beta,  delta,  and  theta  bands,  but  this  can  be 
resolved  since  Dominant  Frequency  analysis  allows 
for  customization  for  the  individual  patient.  An 
indication  of  balanced/proportional  recovery  in 
EEG  Frequency  Bands  is  obtained  using  the 
following  steps: 

1 .  Predict  current  sample  by  weighted  past 
samples  or 

x{k)  =  w(k)  +  a\x{k  -  \)  +  a2x(k  -2)+ ...  +  a px(k  - p) 

where  x(k)  is  the  data  sequence,  a(i),  i  =  l,...p 
are  the  autoregressive  (AR)  parameters,  p  is  the 


NS  =  (| P(f  ~  Pmf  |  + 1  P„,f  ~Phf  |  +  \P,f  ~  Phf  | )/(Plf  +  Pmf  +  Phf  ) 
Where  P/f.  Pmj. ;  and  Phf  represent  the  power  in 
the  low,  medium,  and  high  dominant 

frequency  components  relative  to  their 

respective  baseline  values  [1]. 

RESULTS 

Graded  Response  to  Injury  was  obtained  using:  the 
Cepstral  distance  measure.  We  analyzed  EEG  signal 
by  constructing  an  AR  signal  model.  As  reviewed 
earlier,  the  AR  model  is  appropriate  for 
characterizing  short  EEG  signal  segments  and  yields 
its  spectrum,  which  can  then  be  used  to  identify  its 
characteristic  spectral  peaks  (Fig.  2  (a)).  From  the 
peaks  of  the  AR  spectra,  dominant  frequencies, 
where  the  power  in  EEG  signal  is  concentrated,  are 
identified. 
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DISCUSSION 


This  method  was  used  extensively  in  ischemic  brain 
injury  studies  to  identify  the  duration  and  extent  of 
electrical  function  change  in  response  to  different 
severity  of  insults,  and  to  identify  the  various  phases 
of  recovery  of  electrical  function  [2,  10].  However, 
the  measure  is  not  specific,  in  that  it  is  unable  to 
predict  the  outcome  beyond  the  first  20  minutes 
following  the  insult.  Also,  the  measures  do  not  show 
any  indication  of  the  short-lasting  electro 
physiological  changes  such  as  bursts  or  seizure 

For  our  experiment,  we  used  the  AR  power 
spectrum  to  develop  a  new  index  of  EEG 
recovery — the  normalized  separation  (NS)  (Fig.  3). 
This  study  showed  that  the  HI  injury  causes  a 
dispersion  or  redistribution  of  power  in  the 
dominant  frequencies.  NS  monitors  the  rate  of 
recovery  for  each  band  with  respect  to  baseline.  A 
high  NS  represents  a  disproportionate  recovery  of 


There  are  several  limitations  of  our  approach.  A 
critical  one  is  that  a  pre-injury  baseline  is  needed  to 
compare  the  distance  measure  against  the  post 
injury  measure(s).  The  measure  does  not  distinguish 
power  in  different  frequency  bands  and  different 
spectral  energy  evolutions  in  these  bands.  The 
measure  is  not  specific,  in  that  the  relationship  with 
the  neurodeficit  score  is  not  high  and  that  it  is 
unable  to  predict  the  outcome  beyond  the  first  20 
minutes  following  the  insult.  Also,  the  measures  do 
not  show  any  indication  of  the  short-lasting 
electrophysiological  changes  such  as  spindles, 
bursts  or  seizures.  The  metric  tools  can  only 
partially  characterize  the  static  features  of  the  post¬ 
injury  EEG.  We  hypothesize  that  EEG,  like  many 
other  biological  phenomena,  displays  both  ‘static’ 
and  ‘dynamic’  features.  The  latter  are  formed  as  a 
pattern  generated  by  numerous  neuro-electrical 


Fig.  2:  (a)  Autoregressive  (AR)  spectral  plots  taken 
during  the  course  of  HI  injury  and  recovery  in  rat.  (1)  is 
the  spectrum  during  baseline  prior  to  the  insult;  (2)  is 
the  spectrum  after  1.5  min  of  CA;  (3)  after  2  min  of  CA 
and  (4)  is  after  15  min  of  recovery. 
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Fig.  2:  (b)  shows  the  CD  measure  plotted  as  a 
function  of  time.  The  CD  measure  demonstrably 
discriminates  the  injury  duration  and  severity 
(Geocadin,  et  al.  2000). 


power,  and  vice  versa.  A  high  NS  implies  a  poor 
recovery  of  the  electrical  function. 


events  within  the  brain’s  complex  structure.  The 
static  features  represent  the  global  characteristics  of 
the  system  such  as  the  parameters  with  an  evolving 
trend  suitable  for  monitoring  long-term  EEG 
especially  during  recovery  from  a  trauma.  The 
changes  in  the  dynamical  picture  of  various 
rhythmic  and  chaotic  components  of  the  EEG  signal 
that  make  it  possible  to  detect  changes  in  the  states 
of  the  underlying  mechanisms  [14]. 
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Fig.  3.  Recovery  of  the  relative 
power  in  the  three  dominant 
frequency  bands  for  two  animals. ' 
1-5.5  Hz,  9-14  Hz,  and  'A‘:  18-21 
Hz.  Left:  A  uniform  spectral 
recovery  resulting  in  a  low  NS 
confirmed  by  a  high  NDS  (good 
outcome).  Right:  Spectral  recovery 
for  an  animal  with  a  high  NS 
indicating  spectral  dispersion  or 
unequal  recovery  of  different 
frequency  bands.  The  Neuro  Deficit 
Score  (NDS)  of  this  animal  is  low 
indicating  a  bad  outcome. 
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Abstract 

In  communication  theory,  information  measures 
answer  two  fundamental  questions,  viz:  the  ultimate 
data  compression  (by  entropy)  and  the  ultimate 
transmission  rate  (by  the  channel  capacity).  In  case 
of  brain  and  the  study  of  brain  function  analyzing 
EEG,  the  information  measures  help  to  show  how 
entropy  can  be  used  to  remove  redundancy  in  EEG 
and  consequently  making  it  useful  for  monitoring  of 
brain  function  in  critical  conditions  and  secondly  on 
how  information  transmission  measures  describe 
normal  e.g.  sleep  stages  and  divergence  from  normal 
e.g  epilepsy  or  ischemic  brain  injury. 


entropy  transferred  between  the  joint  distribution  and 
the  product  distribution  p(xi)q(yj),  i.e. 


I(X;  Y)  =  X2>xi’yj)ln 


pfrj.yj) 
p(x  1  )q(y  j ) 


(3) 


Recent  years,  a  novel  non  logarithmic  entropy 
(Tsallis  entropy)  was  introduced  as  the 
generalization  formalism  of  Shannon  Entropy,  which 
is  parameterized  and  dependent  on  an  entropic  index 
r[ 4] 


Hr  =  -(r-l)_1^[^(x.)'  -p(x,)\.  (4) 

•v 

For  r  — » 1  Tsallis  entropy  coincides  with  Shannon 
entropy. 


1.  Introduction 

Shannon’s  entropy  [1]  has  been  accepted  as  a  method 
to  characterize  information  content  in  a  signal. 
Entropy  is  defined  as  a  measure  of  uncertainty  of 
information  in  a  statistical  description  of  a  system  [2]. 
In  other  words,  the  entropy  is  a  measure  of  our 
ignorance  about  the  system.  Given  a  discrete  random 
variable  X  with  alphabet  H={x,}  and  probability 
function  p(xj)=Pr(X=xi),  xi  e  H  the  entropy  is 
defined  by 

H  =  -^p(x,.)ln  p(x,)  (1) 

X[ 

The  relative  entropy  or  Kullback  Leibler  distance 
[3]between  two  probability  functions  p(x)  and  q(x)  is 
defined  as 

Dip  ||  g)  =  X  P(*/)ln 

q(x, ) 

Another  measure  on  the  correlationship  between  two 
systems  is  mutual  information.  Consider  another 
discrete  random  variable  Y  with  probability  function 
p(yj).  The  joint  probability  between  variables  X  and 
Y  is  p{xiyyj).  The  mutual  information  I(X;Y)  is  the 


2.  Application  to  Brain  Rhythms 

Entropy  itself  is  a  description  of  average  uncertainty 
in  the  signal  duration  recorded.  It  is  not  useful  for 
analyzing  nonstationarity.  To  get  a  temporal 
evolution  of  entropy,  an  alternative  time  dependent 
entropy  measure  based  on  sliding  temporal  window 
technique  is  applied  [5]  [6].  Let  {s(&):  k  =  1,..., N} 
denote  the  raw  sampled  signal.  Now  we  define  a 
sliding  temporal  window  W  determined  by  two 
parameters:  the  width  w<N ,  and  the  sliding  step 
A  <  w .  Then  sliding  windows  are  defined  by: 
W(n;w;  A)=  {s(i),i  =  l  +  /?A,...,w  +  /zA}, 
n  =  0,1,2,...,  [iV  /A]  —  w  +  1,  where  [x]  denotes  the 
integer  part  of  x.  Within  each  window  W(n\w;  A), 
we  introduce  the  set  {/,  :/  =  1,...,L}  of  disjoint 
amplitude  intervals  such  that: 

W  =  \Jl,  (5) 

where  L  is  the  number  of  partitions  of  the  amplitudes 
in  window  w.  Then  the  entropy  can  be  calculated  by 
denoting  P'(h)  the  probability  that  the  signal 
s(i)eW(n;w; A)  belongs  to  the  interval  This 
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Fig.  1 .  The  mean  value  (MEAN)  and  standard  deviation  (SD)  of  lmin  TE  over  time  for  an  experiment  with  5min  of 
asphyxia  (bottom  panel) .  The  plots  show  the  characteristic  electrophysiological  pattern  of  the  EEG  during  the  experiment 
(mid  pannel).  The  MEAN  reflects  the  changes  in  EEG  spontaneous  activity  while  the  SD  imprints  the  spike  activity  (top 
panel). 


probability  is  the  ratio  between  the  number  of  s(i)- 
values  of  W(n\  vv;  A)  found  within  interval  /,  and  the 
total  number  of  if/)- values  in  W («;  vv; A).  By 

sliding  the  window  W,  we  can  explore  the  entropy 
evolution  of  the  whole  data  {s(k):k  = 
with  Eqs.  (1)  and  (4): 

WE  (n)  =  ~X  pn(j,)\ap *(/,)  (6) 

/=] 

for  Shannon,  and 

TDE  (n)  -  ~(q  - 1)"'  £  (7.  )T  ~  P "  ;  >]  (7) 

1-1 

for  Tsallis  entropy  respectively. 

Motivated  by  the  belief  that  brain  injury,  such  as 
caused  by  global  ischemia  from  cardiac  arrest, 
results  in  a  reduction  in  the  entropy  of  brain  rhythm 
Bezerianos  et  al  [7]  calculated  Shannon  and  Tsallis 
TDE  in  a  group  of  animals  recovered  from  brain 


asphyxia  [8],  Their  findings  are  in  agreement  with 
those  of  Martin  et  al  [9]  and  even  more  they  proved 
that  they  could  be  used  for  monitoring  the  recovery 
from  brain  asphyxia.  The  mean  value  (MEAN)  and 
standard  deviation  (SD)  of  Tsallis  TDE  entropy  was 
calculated  every  1  min  for  a  period  of  4  hr  (Fig  1).  It 
seems  that  the  MEAN  and/or  the  SD  can  be  used  for 
quantitative  assessment  of  brain  recovery  as  in  the 
past  the  cepstral  distance  has  been  used  [10]. 

The  problem  of  information  flow  in  the  study  of 
brain  dynamics  is  an  essential  one.  Vastano  and 
Swinney  first  applied  the  notion  of  mutual 
information  (MI)  to  study  the  dynamics  of 
spatiotemporal  systems  [11].  MI  detects  linear  and 
nonlinear  statistical  dependencies  between  time 
series,  whereas  the  more  standard  correlation 
function  measures  only  their  linear  dependence.  The 
MI  between  measurement  xt  generated  from  system 
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X  and  measurement  yj  generated  from  system  Y  is 
the  amount  of  information  that  measurement  X; 
provides  about  yj.  Thus,  MI  is  a  measure  of 
dynamical  coupling  or  information  transmission 
between  X  and  Y,  and  when  applied  to  EEG  it  may 
be  postulated  to  be  one  measure  of  functional 
connectivity  [12].  If  one  system  is  completely 
independent  of  another,  then  and  only  then  MI 
between  the  time  series  generated  from  these 
dynamical  systems  is  zero.  The  spatiotemporal 
relationships  of  multichannel  EEG  recordings  to 
measure  the  information  transition  between  various 
cortical  areas  have  been  studied  in  normals  under 
differents  physiological  stages  e.g  awakening,  sleep 
and  simple  arithmetic  tasks  [13]  [14]  in  epilepsy  [15] 
[14]  [13]  and  mental  diseases  [16]. 

Xu  et  al  [14]  computed  the  MI  between  eight  EEG 
channels  and  found  that  the  differences  between  the 
waking  state  with  open  eyes  and  during  sleep  are 
very  significant.  However,  the  subjects  with  then- 
eyes  closed  and  light  sleep  display  similar  variations. 
In  most  cases  showed  large  fluctuations  that 
gradually  decreased  with  time.  In  most  cases  a 
maximum  peak  in  the  MI  appears  in  the  time  span  0- 
500ms  [16]  which  can  be  considered  as  the  time 
period  after,  the  information  generated  in  any 
position  within  the  brain,  has  reached  in  every  other 
place.  The  average  MI  Ixmu+r)  (where  r  is  the  time 
delay)  between  all  electrodes  over  a  time  span  of 
500  ms  were  calculated  to  represent  the  information 
transmission  across  different  cortical  areas  in  normal 
subjects  and  Alzheimer  disease  patients.  The  average 
MI  distribution  is  nearly  symmetric,  suggesting  the 
presence  of  fast  bidirectional  transmission  of 
information  between  brain  areas.  The  MI  in 
Alzheimer  disease  is  lower  than  in  normal  controls 
suggesting  the  association  of  EEG  abnormalities  in 
Alzheimer  disease  patients  with  functional 
impairment  of  information  transmission  in  long 
cortico-cortical  connections. 

Another  EEG  information  study  is  using  the 
information  distance  measure.  In  the  research  of 
cardiac  arrest  asphyxia,  during  the  hypoxia  and 
asphyxia  phases,  we  are  not  only  interested  in  the 
evolution  of  brain  electric  activity  on  each  lead,  but 
also  in  the  relation  between  different  sites.  A  time 
dependent  relative  entropy  distance  measure  based 
on  Kullback-Leibler  entropy  is  a  powerful  solution 


to  the  problem.  It  is  different  from  mutual 
information,  which  is  based  on  conditional 
probability.  Fig.2  shows  entropy  distance  evolution 
of  experimental  EEG  which  including  the  (a) 
baseline,  (b)  hypoxia,  (c)  global  asphyxia  and  early 
recovery,  (d)  later  recovery.  The  information 
distances  between  each  segment  with  the  baseline 
EEG  is  clearly  illustrated  in  the  figure. 


Fig.  2.  Kullback-Leibler  entroy  of  preconditioning  rat. 
The  baseline  EEG  was  chosen  to  be  the  reference.  There 
are  evident  increases  during  the  preconditioning  hypoxia 
and  recovery  phases.  During  the  asphyxia,  the  ECG  and 
artifacts  dominate  the  EEG  recordings.  The  fluctuations 
right  after  the  asphyxia  in  the  K-L  entropy  is  due  to  the 
heart  rate  variation. 

3.  Discussion 

We  showed  that  the  severity  and  the  progression  of 
cerebral  ischemic  injury  can  be  evaluated  by  TDE  of 
neuroelectrical  activity  in  experimental  brain  animal 
models.  We  applied  the  same  methodology  in 
determining  the  outcome  in  human  subjects  after 
cardiac  arrest  and  we  hypothesized  that  TDE  can  be 
used  in  determining  injury  severity  and  outcome  in 
human  subjects.  The  use  of  the  method  for  analysis 
of  EEG  in  epileptic  discharges  is  promising  as  also 
has  been  pointed  out  by  others.  In  conclusion  the 
Tsallis  entropy  is  a  non  redundant  information 
measure  of  brain  dynamics  and  its  application  in 
different  areas  of  interest  is  promising. 
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ABSTRACT 

In  this  paper,  a  second  order  statistics  based  technique 
of  blind  identification  and  equalization  is  proposed  for  min¬ 
imum  phase  channels  driven  by  stochastically  independent 
colored  signals.  Sufficient  identifiability  conditions  are  given. 
Unlike  most  existing  blind  identification  methods,  this  method 
does  not  require  the  number  of  sensors  to  be  greater  than 
the  number  of  source  signals.  Simulation  result  is  given  to 
demonstrate  the  performance  of  the  proposed  algorithm. 

1.  INTRODUCTION  AND  PROBLEM 
FORMULATION 

Blind  identification  and  equalization  of  FIR  (finite  impulse 
response)  and  MIMO  (multi  input  and  multi  output)  chan¬ 
nels  driven  by  colored  signals  are  a  fundamental  problem 
in  a  wide  range  of  applications  such  as  speech  enhance¬ 
ment,  wireless  communications  and  brain  signal  analysis. 
The  existing  works  on  FIR  MIMO  channels  driven  by  col¬ 
ored  signals  include  the  subspace  method  [1],  the  matrix 
pencil  method  [2],  the  blind  identification  via  decorrelating 
subchannels  (BIDS)  method  [3,5]  and  blind  identification 
via  decorrelating  the  whole  channel  (BIDW)  method  [6], 
These  methods  require  the  channel  matrix  to  be  irreducible 
and/or  the  output  signal  number  is  greater  than  the  input  sig¬ 
nal  number.  In  this  paper,  we  will  develop  a  blind  method 
to  identify  square  minimum-phase  FIR  channels. 

Consider  a  FIR  MIMO  channel  described  by 

9 

y (n)  =  H(n)  *  x(n)  +  w(n)=  ^  H(Z)x(n  -  l)  +  w(n) 

/= o 

(1) 

where  *  denotes  convolution,  x(n),  y (n)  are  the  sequences 
of  the  input  and  output  vector  of  dimension  m,  H(n)  is  the 
sequence  of  the  system’s  impulse  response  matrix  of  dimen¬ 
sion  m  x  m,  q  is  the  length  of  the  system's  finite  impulse 
response,  and  w(n)  is  the  noise  vector.  An  equivalent  form 
of  (1)  is: 

y(n)  =  H2(z)x(n)  +  w(n)  (2) 


where  H. (z)  =  Yll-o  H(Z)2~/,  which  is  the  channel  oper¬ 
ator  and  also  referred  to  as  the  channel  matrix. 

We  assume  that  there  are  sufficient  data  so  that  the  second- 
order  statistics  (SOS)  of  y (n)  can  be  exploited.  Then,  we 
can  write  the  autocorrelation  function  of  y(n)  as: 

1  N_1 

Cyy(T)  =  ^lim^  —  y(n)yT(n  -  r) 

n— 0 

and  the  power  spectral  matrix  of  y(n)  as: 

OO 

Syy(z)  =  ^>2  Cyy  (t)z 

T  =  —  OO 

=  H!(2)SXIWHr(r1)+Sww(2)  (3) 

where  all  notations  are  defined  in  an  obvious  way.  The 
above  formulation  assumes  that  the  noise  w(n)  is  uncorre¬ 
lated  with  the  desired  signal  x(n). 

The  aim  here  is  to  estimate  the  channel  H2(z)  and/or 
recover  the  input  signals  using  the  power  spectral  matrix  of 

y(«)- 

Our  proposed  method  includes  two  main  steps:  1)  blind 
signal  separation,  which  aims  to  find  the  separator  G(z) 
such  that  G(z)H(z)  —  diag\  2)  channel  estimation  and 
signal  recovery,  which  aims  to  compute  the  channel  from 
G(z)H(z)  =  diag  and  recover  the  original  input  signals. 
The  next  two  sections  will  deal  with  these  two  steps.  In  sec¬ 
tion  4,  computer  simulation  results  are  shown  to  verify  the 
effectiveness  of  the  proposed  algorithm. 

2.  BLIND  SIGNAL  SEPARATION 

In  this  section,  we  apply  the  separation  algorithm  developed 
in  [6]  on  the  square  channel  case.  In  order  to  present  the 
method  clearly,  we  need  introduce  the  concept  diversity  de¬ 
fined  in  [6],  Let  h(z)  =  [hi(z),  h2{z ),  •  ■  •  ,  hm(z)]T  be  a 
polynomial  vector  with  dimension  m,  and  its  greatest  com¬ 
mon  divisor(GCD)  be  c(z),  then  the  mth  diversity  of  h(z), 
denoted  by  divm(h(z)),  is  defined  as 

divm(h(z))  =  deg(h(z))  -  deg{c(z))  (4) 
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For  A  =  2, 3,  •  •  ■  ,  m  -  1,  the  Ath  diversity  of  h(z),  de¬ 
noted  by  divk(h(z)),  is  defined  as  the  minimum  of  the  Ath 
diversity  of  any  A  dimensional  subvectors  of  h(z),  i.e., 

divk(h{z))  =  min  {divk  ([hi,  (z),  hh(z),  •  •  •  ,  h,m  (z)]T)  : 

1  <  z'i  <  i-2  <  ik  <  rn}  (5) 

Correspondingly,  for  the  power  spectral  matrix,  denoted 
by  Sxx(z)  =  diag(si(z),  s2(z),  ■  ■  ■  ,sm(z)),  of  m  inde¬ 
pendent  colored  signals,  the  Ath  diversity  of  Sxx(z),  de¬ 
noted  by  divk  (Sxx),  is  defined  as  half  of  the  Ath  diversity  of 
the  polynomial  vector  s (z)  =  [si(z),  s2(z),  -  -  -  ,s,„(z)]T. 
div2( Sxx)  is  the  diversity  introduced  in  [3], 

Now  we  present  two  technical  Lemmas,  which  are  cru¬ 
cial  for  the  development  of  the  separation  algorithm: 

Lemma  1.  LetH(z)  be  a  m  x  m  polynomial  matrix  with 
degree  q,  Sxx(z)  be  the  diagonal  input  power  spectra  with 

div2(Sxx(z))  >  mq and g(z)  =  [tfi  (z),g2(z),  •  -  •  ,<7,n(z)] 
be  a  polynomial  vector  with  deg(g(z))  <  (m  —  1  )q.  Then 

divm[g(z)T Syy(z)}  <  q  (6) 

if  and  only  if  gr(z)H(z)  has  only  one  nonzero  element. 

Proof:  Note  that  Sxx(z)  is  diagonal,  the  sufficiency  is 
obvious.  Now  we  show  the  necessity.  Assume  that  gT(z)H(z) 
has  L  non-zero  elements,  we  only  need  to  show,  if  L  >  2, 

divm{gri  (z)Syy(z))  >  q  (7) 

Denote 

gT(z)H(z)  =  [ci(z),c2(z),---  ,cm(z)]  (8) 

Sxx(z)  =  diag(s1(z),s2(z),--  -  ,s,„(z ))  (9) 

d(z)=  [ci(z)si(z),c2(z)s2(z)w  ,cm(z).s,„(2)]  (10) 

Then 

gT(z)Syy(2)  =  d(z)HT(2-1)  (11) 

Note  that  sfz),  1  <  i  <  m  are  double-side  polynomials 
and  deg(ci(z))  <  mq,  we  have 

divm(d(z))  >  2div2(Sxx(z))  -  mq 

Since  any  L  x  m  submatrix  of  H7  (z-1)  has  at  most  Lq 
zeros  (including  the  infinite),  we  have 

divm{g(z)TSyy(z))  >  divm(d(z))  +  q  -  Lq  >  q 

The  proof  is  completed. 

This  result  is  rather  conservative.  If  any  digonal  func¬ 
tion  of  Sxx(z)  does  not  share  common  zeros  with  the  other 
diagonals,  which  happens  in  most  cases,  the  diversity  con¬ 
dition  can  be  much  weaker. 


Lemma  2.  Let  H(z)  be  a  m  x  m  polynomial  matrix  with 

T 

degree  q,  g (z)  =  [^(z),^),  -  •  •  ,g,„{z)\  be  a  polyno¬ 
mial  vector  with  dr g(g(z))  <  (m-l)q,  Sxx(z)  be  the  diag¬ 
onal  input  power  spectra  with  div(Sxx(zj)  >  (0.5m  +  1  )q 
and  any  two  diagonal  elements  ofSxx(z)  do  not  share  com¬ 
mon  zeros.  Then 

divm[g{z)TSyy(z)]  <  q 

if  and  only  if  gT(z)H(z)  has  only  one  nonzero  element. 

Proof:  The  proof  is  similar  to  that  of  Lemma  1 .  Assume 
that  gr(z)H(z)  has  L  non-zero  elements,  we  only  need  to 
show,  if  L  >  2, 

div,n(gT (z)Syy(z))  >  q 

Since  any  two  of  s,(z),  1  <  i  <  m  do  not  share  com¬ 
mon  zeros,  we  have 

div„,  (d(z))  >  2div2(Sxx(z))  - 

Note  that  Lxtn  submatrix  of  H7  (z~1 )  has  at  most  Lq 
zeros  (including  infinite),  it  follows 

divm(g(z)T Syy(z))  >  div(d(z))  +  q  -  Lq  >  q 

and  then  the  proof  is  completed. 

Remarks.  1 ).  The  result  of  Lemma  2  is  still  conserva¬ 
tive.  If  Sxx(z)  and  H(z)  are  any  fixed  matrices,  the  identi¬ 
fication  condition  can  be  further  relaxed. 

2).  The  channel  degree  may  be  unknown  and  it  can  be 
identified  by  minimizing  q  under  the  condition  g(z)  with 
degree  (m  -  l)r;  exists  such  that  divm  [g(z)7  Syy  (z)]  <  q. 

It  is  known  that  there  exists  G(z)  with  degree  (m  -  1  )q 
such  that  G(z)H(z)  is  diagonal.  If  the  input  power  spectra 
satisfy  the  conditions  in  Lemma  1  or  Lemma  2,  we  can  find 
the  separator 

G(z)=[gi(z)  gz  (z)  g™(z)]7 

by  searching  for  g j(z)  such  that 
<Hvm(gT(z)8yy(z)  <  deg{H(z)),i  =  1,2, ,m  (12) 

In  [6],  an  efficient  algorithm  was  proposed  to  find  g ,(z) 
satisfying  (12).  Applying  this  algorithm,  we  can  find  the 
separator  G (z)  such  that  G(z)H(z)  is  diagonal.  In  the  fol¬ 
lowing  section,  we  will  show  that  the  channel  can  be  com¬ 
puted  directly  from  G(z)H(z)  =  ding. 

3.  CHANNEL  ESTIMATION  AND  SIGNAL 
DECONVOLUTION 

The  following  Lemma  shows  the  channels  can  be  identified 
up  to  scaling  and  permutation  once  its  separator  is  obtained. 
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Lemma  3.  Given  a  nonsingular  polynomial  matrix  G(z), 
there  exists  a  unique  column-wise  coprime  polynomial  ma¬ 
trix  H(z)  (up  to  column  scaling)  such  that  G(z)H(z)  is 
diagonal. 

Proof.  Suppose  G(z)H(z)  =  T(z)  where  H(z)  is  column¬ 
wise  coprime  and  T(z)  is  diagonal.  Then 

H(,)  =  (G(  .))-r(.)=«t«))ii 

Denote  adj(G(z))  =  M(z)D(z )  where  D(z)  is  a  diago¬ 
nal  polynomial  matrix  and  M (z)  is  a  column-wise  coprime 
polynomial  matrix.  Then  we  have 


H  (z)  =  M(z) 


mm 

det(G{z)) 


Note  that  H(z)  is  polynomial  and  M (z)  is  column-wise  co- 
prime,  must  be  a  scalar  diagonal  matrix  and  the 

proof  is  completed. 


Hence,  if  the  channel  is  column-wise  coprime,  then  from 
the  separator  G(z),  we  can  compute  the  channel  H(z).  Now 
we  present  a  time  domain  computation  method. 

Denote  lg  =  (m  -  1)^,  G(z)  =  Gt -z~k  and 


W  = 


- 1 

0 

o 

Gx  - 

Go 

• 

Gi 

Gis. 

Let  Wfc(fe  =  1, 2,  •  ■  •  ,m)  equals  W  by  deleting  all  its 
((*  —  l)m  +  fc)throws  (i  =  1,2,  •••  ,lg  +  q- 1-1).  Then  from 
G(z)H(z)  =  diag,  it  follows 


Wkhk  =  0 


where  k  =  1, 2,  •  •  •  ,  m  and  h*  is  the  fcth  column  of 

H0" 

Hi 


common  factor,  the  remainer  is  the  corresponding  column 
ofH(z). 

Once  the  channel  H(z)  is  obtained,  exclude  the  com¬ 
mon  row  factors  of  G(z)  and  we  get  a  row-wise  coprime 
G(z).  Compute  G(z)H(z)  =  diag(di(z),d2(z),  ■  ■  ■  ,dm(z)) 
Since  G(z)  is  row-wise  coprime,  any  zeros  of  any  dfz) 
must  be  a  zero  of  H(z).  Hence  all  d,(z)  are  minimum 
phase  if  the  channel  is  minimum  phase  and  the  source  sig¬ 
nals  Xj(n),  i  =  1, 2,  •  •  •  ,  m  can  be  recovered  uniquely  (up 
to  scaling  and  permutation)  by  the  deconvolution  of  u,(n)= 
g;(z)y(n)  =  dj(z)xj(n)  from  dfz).  Another  recovery 
method  is  to  compute  the  inverse  of  the  estimated  channel 
H(z)  and  recover  the  input  signals  by  computing  x(n)  = 
H(z)-1y(n)  directly. 


4.  SIMULATIONS 


In  order  to  show  how  well  this  method  performs,  we  con¬ 
sider  a  3  x  3  FIR  channel  H(z)  of  degree  1  driven  by  three 
real  speech  signals  (three  sentences  from  the  Linguistic  Data 
Consortium):  1).  “She  had  your  dark  suit  in  greasy  wash 
water  all  year”;  2).  “Don't  ask  me  to  carry  on  oily  lag 
like  that”;  3)  “Draw  every  outer  line  first,  then  fill  in  the 
interior”.  Each  sentence  has  about  46797  samples  under 
sample  rate  16000  Hz.  In  our  simulation  example,  the  first 
30000  samples  of  them  are  applied.  In  order  to  guarantee 
the  channel  to  be  minimum-phase,  the  channel  is  selected  in 
the  following  way:  randomly  select  two  nonsingular  matri¬ 
ces  Ho,  D  and  let  H(z)  =  Ho  (i  +  (d)  z~l )  >  where 

crmo.r(D)  denotes  the  maximal  singular  value  of  D.  The 
channel  selected  in  our  example  is 


H(z)  = 


0.2241 

0.3229 

-0.2951 


-1.5004 

0.9436 

1.8035 


0.1029" 

0.4934 

0.4264 


+ 


-0.3148  0.0609 
-0.0659  0.1353 
0.2936  0.0972 


0.4503 

-0.2142 

-0.4530 


v-i 


Fig.  1-3  show  the  performance  of  the  algorithm.  The  chan¬ 
nel  estimation  relative  error  is  0.0591.  The  three  original, 
mixed  and  recovered  signals  are  shown  in  Figure  1,  Figure 
2  and  Figure  3  respectively.  The  recovery  is  nearly  perfect 
except  for  a  slight  noise  on  the  first  recovered  signal. 


H, 


5.  CONCLUSION 


If  all  the  columns  of  H(z)  have  the  same  degree,  then 
the  solution  of  the  above  equation  is  unique  (up  to  constant 
seating)  and  we  can  get  the  channel  parameters  directly.  If 
H(z)  has  different  column  degrees,  some  equations  may 
have  a  solution  space.  Take  any  one  solution  and  we  can 
formulate  a  column  polynomial  h(z),  then  excluding  the 


In  this  paper,  a  second  order  statistics  based  blind  system 
identification  method  for  square  minimum-phase  channels 
has  been  developed.  This  method  requires  the  channel  to 
be  column-wise  coprime  and  the  input  power  spectra  are 
sufficiently  diverse.  Unlike  most  existing  blind  identifica¬ 
tion  methods,  this  method  does  not  require  the  output  signal 
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(a).  Speakerl 


number  to  be  greater  than  the  input  signal  number.  Com¬ 
puter  simulation  has  been  shown  to  verify  the  effectiveness 
of  the  proposed  algorithm.  But  the  present  algorithm  is 
not  very  robust.  The  future  work  is  to  analyze  the  perfor¬ 
mance  robustness  and  to  improve  the  proposed  algorithm  to 
be  more  robust  against  noise  and  other  possible  uncertain¬ 
ties. 
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Fig.  1  The  original  speech  signals 
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Fig.  2  The  mixed  speech  signals 
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Fig.  3  The  recovered  speech  signals 
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ABSTRACT 

We  consider  the  problem  of  blind  equalization  of  nonlinear  chan¬ 
nels  from  the  second-order  statistics  of  the  channel  output.  The 
channel  model  is  linear  in  the  parameters,  with  additive  terms  that 
are  nonlinear  functions  of  the  transmitted  symbols.  All  previous 
approaches  assume  that  the  corresponding  channel  matrix  has  full 
column  rank,  which  ensures  the  existence  of  linear  FIR  zero  forc¬ 
ing  equalizers.  We  show  that  this  assumption  is  not  necessary, 
and  that  under  certain  circumstances  linear  FIR  equalizers  can  be 
found  despite  the  violation  of  this  assumption.  An  important  con¬ 
sequence  of  this  fact  is  that  equalization  can  be  effected  with  a 
smaller  level  of  diversity.  In  this  paper  necessary  and  sufficient 
conditions  on  the  channel  matrix  are  given.  An  algorithm  for  the 
computation  of  the  equalizers  is  also  given  for  those  channels  sat¬ 
isfying  these  conditions,  assuming  an  i.i.d.  symbol  sequence  and 
memory  dominance  of  the  linear  part. 

1.  INTRODUCTION 

Recently  blind  equalization  of  single-input  multiple-output  (SIMO) 
channels  has  received  considerable  attention,  due  to  the  fact  that 
these  channels  can  be  perfectly  equalized  if  the  equalizer  is  long 
enough  and  the  subchannels  tire  coprime.  This  equalizer  can  be  ob¬ 
tained  from  the  second-order  statistics  (SOS)  of  the  received  signal 
[7]. 

With  a  few  exceptions  [1,  6,  9],  almost  all  the  available  liter¬ 
ature  on  blind  equalization  is  devoted  to  the  linear  channel  case. 
However,  many  real  world  communication  systems,  such  as  radio 
links  with  high  power  amplifiers,  high-density  magnetic  and  op¬ 
tical  storage  channels,  etc.,  exhibit  a  considerable  degree  of  non¬ 
linearity.  Thus  it  is  of  interest  to  consider  blind  equalization  of 
nonlinear  channels.  Our  1 -input,  p-output  channel  model  has  the 
form 

i  h 

y(k)  =  ^  hijack  -  j)  +  n(k),  (1) 

1  =  1  3=0 

where  si(k)  =  a  ( A- )  is  the  scalar,  stationary  input,  the  terms 
si(k)  =  fi(a(k).  a(k  —  1),  •  •  •)  for  i  =  2, ....  q  are  scalar  non¬ 
linear  causal  functions  of  a(-),  hjj  are  p  x  1  coefficient  vectors, 
and  n(k),  y(k)  are  p  x  1  signal  vectors  representing  an  additive 
disturbance  and  the  observed  signal,  respectively.  «(•)  and  «(-) 
are  assumed  independent.  This  model  accommodates  polynomial 
approximations  of  nonlinear  channels  (Volterra  models),  but  the 
‘basis  functions’  s,  (-)  need  not  be  monomials  in  principle. 

Supported  in  part  by  NSF  grants  CCR-9973133  and  ECS-9970105 


We  are  interested  in  equalizer  design  for  the  class  of  channels 
(1)  using  only  the  SOS  of  y(-).  As  shown  in  [1],  under  certain 
conditions  linear  finite  impulse  response  (FIR)  filters  can  perfectly 
equalize  nonlinear  SIMO  channels  of  the  type  (1).  For  those  cases, 

[1]  presented  a  blind,  deterministic  approach  for  equalizer  design. 
However  it  has  been  shown  in  [2]  that  the  conditions  in  [1]  are  in 
fact  conservative.  More  general  sufficient  conditions  on  the  chan¬ 
nel  and  the  input  signal  statistics  for  SOS-based  blind  equalizabil- 
ity  were  presented  in  [3]. 

Observe  that  ( 1 )  could  be  seen  as  a  linear  multiple-input  multiple- 
output  (MIMO)  system  if  we  regard  the  nonlinear  terms  ,s,  (-)  as 
additional  inputs.  However,  standard  SOS-based  equalization  tech¬ 
niques  for  MIMO  systems  usually  assume  that  the  different  inputs 
are  uncorrelated  (which  is  no  longer  true  in  our  setting),  and  they 
only  resolve  the  inputs  to  within  a  mixing  matrix  [7].  In  addition, 
in  our  case  only  the  term  s  i  ( • )  is  of  interest. 

All  previous  approaches  [1,  2.  3]  assume  that  the  so-called 
channel  matrix  constructed  from  the  channel  coefficients  has  full 
column  rank.  In  that  case  linear  FIR  equalizers  always  exist.  How¬ 
ever,  a  consequence  is  that  the  number  of  subchannels  required  by 
these  schemes  must  exceed  the  number  of  distinct  kernels  in  (1). 
This  level  of  diversity  may  at  times  be  unacceptably  high.  In  an 
earlier  paper,  [4],  we  had  shown  that  in  a  linear  multi-user  mul¬ 
tichannel  setting,  this  full  column  rank  condition  can  be  relaxed, 
and  a  lower  level  of  diversity  can  be  tolerated.  In  particular  sup¬ 
pose  in  (1)  the  s,{k)  are  independent  users,  and  the  goal  is  only 
to  equalize  ,s,  (k).  Then  [4]  gives  a  necessary  and  sufficient  con¬ 
dition  for  equalization  of  that  .st  (A:).  Clearly  this  same  condition 
will  also  ensure  the  existence  of  a  linear  FIR  equalizer  for  the  non¬ 
linear  setting  of  this  paper.  Should  /,  exceed  all  other  ,  and  the 
Sj(k)  are  white  and  mutually  uncorrelated  then  [4]  also  provides 
an  algorithm  that  permits  the  construction  of  the  equalizer  from 
the  out  put  SOS  alone.  However,  in  the  nonlinear  setting  one  can¬ 
not  assume  that  the  Sj(k)  are  mutually  uncorrelated  even  ifa(k) 
is  white,  as  si{k)  are  nonlinear  functions  ofa(k).  Thus  the  al¬ 
gorithm  of  [4]  cannot  be  applied  to  nonlinear  channels.  The  key 
contribution  of  this  paper  is  to  formulate  an  algorithm  that  pro¬ 
vides  the  required  equalizer  from  the  output  SOS,  provided  the 
equalizability  condition  of  [4]  are  met,  and  even  if  the  s,(k)  are 
statistically  dependent.  This  algorithm  assumes  that  the  memory 
of  the  nonlinear  part  is  strictly  less  than  that  of  the  linear  part. 
Simulation  results  are  given  as  evidence  of  the  feasibility  of  this 
procedure. 

In  our  notation,  (-)T ,  (-)H ,  (■)*  denote  transpose,  conjugate 
transpose  and  pseudoinverse  respectively;  ,  ./„  denote  respec¬ 
tively  the  n  x  n  identity  matrix  and  the  shift  matrix  with  ones  in 
the  first  subdiagonal  and  zeros  elsewhere,  and  en  denotes  the  n-th 
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unit  vector. 

2.  CONDITIONS  FOR  THE  EXISTENCE  OF  LINEAR  FIR 
ZF  EQUALIZERS 

By  stacking  in  consecutive  samples  of  <j(-)  into 

Y( *-)7'  =  [  »(*-)T  »(*’  -  l)r  •  ■  •  »(*■  -  +  l)7  ]• 

one  gets 

Y(k)  =  HS(k)  +  N(k).  (2) 

with  N(k)r  =  [  n(k)1  n  {k  —  l)1  •••  »(A-  »»  +  l)r  ], 

S(A)7  =[  S/  (A-)  S_!  (k)  ]  the  noise  and  signal  vectors, 

ST(A-)  =  [  n(A-)  «(A-  —  /,  -»«  +  l)  ]•  (3) 

Sj’(A)  =  [  .s,(A)  •••  ,s.,(A  -A. -m  +  1)  |  ••• 

...  |  s„(k)  •••  .s,(A-/„  -m  +  1)  ]■  (4) 

and  the  channel  matrix  77  =  [  77 1  Hi  ■■■  H ,  ],  with  every 
77,  block  Toeplitz: 


/»;<>  A',/,  J 

For  convenience,  let  <7|  =  /»  +  1\ ,  which  is  the  size  of  S,  (A  ),  the 

linear  part  of  the  regressor;  and  <7_>  =  Aj  - h  /,,  +  (ry  —  l)/», 

which  is  the  size  of  5j(  A  )  (thus  5(A  )  is  ( <7 1  +  <7j)  x  1). 

Observe  that  if  the  channel  matrix  V  has  full  column  rank, 
then  its  pseudoinverse  77  #  satisfies  77#77  =  Id[  +a2.  In  that  case, 
in  the  noiseless  case  (,V(  A  )  =  0)  one  obtains  from  (2)  77  #Y (A  )  = 
5  (A:).  Thus  the  first  <1,  rows  of  77#  provide  zeroforcing  (ZF) 
equalizers  with  associated  delays  0  through  <1\  —  1.  However,  this 
also  shows  the  existence  of  vectors  (the  last  <l>  rows  of  77# )  that 
recover  all  the  nonlinear  terms  .s,  (A  )  and  their  delays,  which  is 
clearly  not  necessary  since  these  terms  are  of  no  interest  to  the 
receiver.  This  leads  us  to  ask  for  necessary  and  sufficient  condi¬ 
tions  on  77  for  the  ZF  equalizers  to  exist.  First,  let  us  introduce  the 
following  partition  of  the  channel  matrix: 

77  =  [77,  77„,  ]  with  77„,  =[77.  •••  H,,  ].  (5) 


Thus  linear  FIR  ZF  equalizers  exist  iff  Hi  has  full  column 
rank  and  no  nonzero  vector  lies  in  the  range  space  of  both  H  ,  and 

'Will- 


3.  SOS-BASED  EQUALIZER  DESIGN 


We  turn  our  attention  now  to  the  problem  of  extracting  the  equaliz¬ 
ers  from  the  SOS  of  the  received  signal,  assuming  that  H  satisfies 
the  relaxed  rank  condition  Al.  From  (2),  the  covariance  of  the 
received  vector  Y  ( • )  is  given  by 

C„(/)  =  cov[Y(A  ).  Y(k  -  /)]  =  77C,(/)77W  +  C„(/).  (7) 

with  C'»(/)  =  cov[5(A' ).  S(k  —  /)],  C'„  (/)  =  cov[.Y(A  ).  .Y(A •  — /)] 
the  signal  and  noise  covariance  matrices.  In  addition  to  Al,  we 
adopt  the  following  standard  assumptions: 

A2:  »(-)  is  zero-mean,  white,  with  covariance  <r~  I,,. 

A3:  The  covariance  matrix  C*(0)  is  positive  definite. 

Observe  that  [4]  assumes  that  0,(0)  is  diagonal.  This  as¬ 
sumption  is  not  needed  here.  Under  Al  and  A2,  can  be  es¬ 
timated  as  the  smallest  eigenvalue  of  C>;( 0).  Thus  the  effect  of  the 
noise  can  be  removed  from  C,,  ( / ) ;  henceforth  we  shall  assume  that 
C',,0)  =  HCAI)Hh.  A3  is  a  ‘persistent  excitation’  condition  on 
(•/(•),  which  allows  us  to  write 

0,(0)  =  QQH  with  Q  invertible.  (8) 

Now'  let  Q  be  a  square  root  of  0,(0)  as  in  (8).  and  define  the 
normalized  channel  and  source  covariance  matrices  respectively 
as 

H  =  HQ.  C'Al)  =  Q~'C'AI)Q~H  ■  (9) 

Using  (9),  the  matrices  0,,(/)  become 

C'Al)  =  HCA1)Hh.  with  0,(0)  =  ki ,+h,.  (10) 

The  following  result  relates  the  ZF  equalizers  to  the  normalized 
channel  matrix  H. 


Lemma  2  Under  A1-A3,  let  the  square  root  of  C' ,(0),  Q,  be  block 
lower  triangular: 


Q  n 
Qn  Qi 


0 


with  Qij  of  size  d,  x  rlj.  (11) 


That  is,  Uni  comprises  the  ‘nonlinear  part’  of  the  channel  matrix. 
Recall  that  Hi  and  H„\  have  sizes  pm  x  <7,  and  pm  x  <7j  respec¬ 
tively.  We  shall  make  the  following  assumption: 

Al:  Hi  has  full  column  rank,  and  with  n  =  rank (70  ),  r,  — 
rank ( 77, ,i ),  H  satisfies  rank (77)  =  n  +  /  •_.  <  pm. 

Observe  that  if  77  has  full  column  rank,  then  Assumption  Al 
is  satisfied  but  not  conversely.  The  significance  of  this  condition  is 
reflected  in  the  following  result  from  [4]: 


Then  the  matrix  C,  satisfying  (6)  (ZF  equalizers)  is  given  by 

gH=Qu{Id,  0  ,hxd,]H*.  (12) 

Thus  if  77  =  A'lYl  ’  is  an  SVDof  77,  with  Oi :  pm  x  (;  i  +r> ), 
S:  (n  +r,)x  (/  i  +rj),  Y:  (n  +r  -)  x  (<h  +(h),  and  partitioning 
1 '  as 

V  =  [  V|  1 7  ].  of  size  (ci  -(-  r-i )  x  <7,,  (13) 


Theorem  1  There  exists  a  pm  x  <7,  matrix  Q  such  that 


then  the  equalizers  are  given  by 


gHn  =  [i,h  Orf.xrf,  ]  (6) 


Q  nl. 


(14) 


if  and  only  if  Assumption  Al  holds. 

The  columns  of  g  constitute  the  desired  ZF  equalizers.  The 
geometrical  interpretation  of  Theorem  1  is  as  follows. 

Lemma  1  The  condition  rank([  77 1  77„i  ])  =  rank(77i ) + 
rank(77„i)  is  equivalent  to  rango(77i)  f)  rang(’(77ni )  =  {0}. 
with  range  (.4)  the  subspace  spanned  by  the  columns  of  A. 


Observe  that  Oi ,  is  known  to  us  from  the  source  statistics,  and 
that  Y,  Ui  can  be  obtained  from  an  SVD  of  C,,(0)  since 

C„(0)  =  77C',(0)77h  =  HHh  =  U,ZrU? .  (15) 

Therefore  if  1 )  could  be  somehow  estimated,  the  ZF  equalizers 
could  be  computed.  Note  that  V'V'H  =  1)1  ]H  + 1  yV,H  =  /,■ ,  • 

An  additional  property  is  shown  by  the  next  result. 
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Lemma  3  Under  A1-A3,  let  Q  be  block  lower  triangular  as  in 
(11).  Let  H  —  HQ  have  a  singular  value  decomposition  H  — 
V i  ST",  and  partition  I '  as  in  (13).  Then 

ViHVi  =/«/,,  I  |H I:j  =  Orf,  xrf,.  (16) 


This  property  is  obvious  in  the  full  column  rank  case  (for 
which  V  is  square),  but  it  is  somewhat  surprising  that  it  still  holds 
even  under  the  relaxed  rank  condition  Al.  Now  consider  the  ma¬ 
trix 

R(l)  =  ?-'uFcy(l)U^-'.  (17) 

which  satisfies  i?(l)  =  V'C'S(1)I'H.  From  (16),  this  gives 


i?(i)v,  =  rcs(i) 


rh(i)Vi  =  rcffi) 


0 

(18) 


These  relations  will  allow  us  to  estimate  l  )  under  the  following 
additional  assumptions: 


A4:  The  symbol  sequence  a(-)  is  a  zero-mean  i.i.d.  process  with 
co  x[a(k).a(k)]  =  o’;. 

A5:  S2(k)  satisfies  S2(k)  =  f(a(k).a(k—  1) . a(k-di  +2)), 

with  /(•.  • . •)  a  memoryless  mapping. 

Basically  A5  amounts  to  saying  that  the  memory  of  the  non¬ 
linear  part  of  the  channel  is  strictly  shorter  than  that  of  the  linear 
part.  One  has  the  following  result: 


Lemma  4  Under  A1-A5,  a  lower  block  triangular  square  root  Q 
as  in  (11)  exists  such  that  Q  i  i  =  o~  I,i  ,  and 


C,(l) 


CAd,  -1) 


Jd,  0 

0  c 


for  some  d  ,  x  d2  C.  (19) 

(20) 


Substituting  (19)  in  (18)  one  obtains  the  Jordan  chains 

R(  1)V,  =  R"(  1)V,  =  V,  Jjj,  (21) 


which  show  how  \\  can  be  estimated  once  its  first  or  last  column 
is  available.  Partition  I \  =  [  i’i  v2  ■  ■  ■  vj,  ]  columnwise, 
and  consider  the  matrix 


R(ds  -  1)  =  D-'ufCA'h  -  l)t-r,  S_1,  (22) 

which  satisfies  77(7,  —  1)  =  VCA<h  —  1)V'H.  Using  (20), 

R(dj  -  1)  =  I'crf,  ef  V"  =  tv,  c,H.  (23) 

Thus  R(d j  —  1 )  is  a  rank  one  matrix  and  its  only  nonzero  singular 
value  equals  1.  The  vectors  t>j ,  vd ,  _  |  can  be  obtained  up  to  a  con¬ 
stant  of  the  form  e?6  from  an  SVD  of  R(d\  —  1),  or  alternatively 
they  can  be  estimated  as 


of  the  Jordan  chains  (21),  thus  obtaining  an  estimate  V)  satisfy¬ 
ing  I)  =  for  some  real  8.  Therefore  the  matrix  Cr/F  = 

rr a  U\  S_1 1  )  satisfies  Q/yTL  —  e*eIdj  ,  providing  equalization  up 
to  an  unknown  phase  rotation.  This  is  acceptable  since  the  need  for 
a  phase  reference  can  be  sidestepped  by  differentially  encoding  the 
data.  Finally,  it  is  possible  to  obtain  the  Minimum  Mean-Squared 
Error  (MMSE)  equalizers  in  the  spirit  of  [5]: 

Lemma  5  Under  Assumptions  A1-A3,  the  MMSE  equalizers  (JMMSE 
minimizing  trace  .E[|7/WIr(A:)  —  Si(A:)|2]  are  related  to  the  IF 
equalizers  by 

0MMSE  =  [/-  <UnC jT’fO^ZF  (25) 

where  now  C, ,  (0)  =  TIC  A  0)Hh  +  of,  Ip  represents  the  unde- 
noised  channel  output  covariance  matrix. 

The  resulting  algorithm  is  summarized  next. 

Blind  equalization  algorithm 


1.  Compute  estimates  C'j,(0),  Cg(l),  Cy(di  —  1). 

2.  Estimate  a"  as  the  smallest  eigenvalue  of  C,,(  0)  and  sub- 
stract  the  noise  effect  from  Cy  {■). 

3.  Perform  an  SVD  of  C„( 0)  as  in  ( 15)  to  obtain  U\ ,  S. 

4.  Compute  J?(l),  R(d,  —  1)  as  in  (17),  (22)  respectively. 

5.  Form  the  estimates  fv, ,  i’i  via  (24). 

6.  For  i  =  2,  3, . . . ,  dt ,  let  =  J?(l)r,_j .  Alternatively,  for 
j  =  dud ,  —  1 — ,  2,  let  tv-J 

7.  ZF equalizers:  •?,  v<ii  ]. 

8.  Compute  the  MMSE  equalizers  via  (25). 


4.  SIMULATION  RESULTS 

We  present  now  a  numerical  example  of  the  results  obtained  by  the 
algorithm.  For  illustration  purposes,  the  phase  ambiguity  inherent 
to  the  method  was  removed  before  computing  the  error  rates.  Av¬ 
erages  were  computed  based  on  100  independent  runs. 

The  channel  we  consider  is  real  with  q  =  3,  /j  =  4, 12  =  l?.  = 
1  and  i.i.d.  symbols  taking  the  values  ±1  with  equal  probabilities. 
The  number  of  subchannels  is  p  =  4;  the  coefficients  are  given  in 
table  1.  The  nonlinear  terms  are  .s2 (k)  =  a(k)a(k  —  1),  ■S3  (A:)  — 
a(k)a(k  —  2).  The  resulting  linear-to-nonlinear  distortion  ratio 
for  this  channel  is  8  dB.  The  equalizer  length  that  we  consider  is 
in  =  6.  The  corresponding  channel  matrix  H  (of  size  24  x  24)  is 
not  full  column  rank  but  it  satisfies  the  relaxed  rank  condition  Al: 


_  R(d i  ~ 

rfi  iiB(d,  -  i)e.-mjr 

where 


- 1) 

||e"_,R(dl-l)|r 


(24) 


(max  =  argmax{||J?(di  -  l)e,||.  1  </<</]+  d,}. 
jmax  =  argmax{||e^ R(di  —  1)||.  1  <  j  <  d\  +  d2}. 

In  this  way  these  estimates  v,i , ,  f  i  are  related  to  the  true  quanti¬ 
ties  I'd, ,  uj  by  some  complex  constants  with  unit  modulus.  From 
these,  the  remaining  columns  of  V'i  can  be  estimated  via  either 


23  =  rank (7/)  =  rank ( 77 1 )  +  rank([  Hi  Tls  ])  =  10  +  13. 

Note  that  this  channel  satisfies  A5,  and  that  can  still  be  esti¬ 
mated  as  the  smallest  eigenvalue  of  C',,(0)  even  though  the  channel 
matrix  77  is  square,  since  77  is  rank  deficient.  Figure  1(a)  shows 
the  symbol  error  rate  (SER)  vs.  SNR  using  K  =  2000  samples  for 
covariance  estimation,  while  figure  1  (b)  shows  the  variation  of  the 
SER  with  I\  for  a  fixed  value  of  SNR  =  24  dB,  for  the  equalization 
delays  0,  3,  8  and  9.  In  this  case  the  equalizer  with  maximal  delay 
(d  =  9)  provides  the  poorest  performance  of  all.  The  best  results 
are  obtained  with  the  equalizer  of  delay  d  =  3. 
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channel 

h  in 

fc.-j 

hr., 

hn 

h  jo 

A,, 

Asn 

Am 

1 

1.0 

0.5 

0.4 

0.2 

-0.2 

0.2 

0.5 

0.1 

-0.1 

2 

0.1 

0.6 

1.0 

-0.4 

0.2 

0.1 

0.25 

0.2 

-0.2 

3 

0.6 

0.6 

0.1 

0.2 

0.5 

0.2 

-0.2 

4 

FiCT 

gild 

EEII 

0.1 

0.25 

0.1 

-0.1 

Table  1:  Coefficients  of  the  Volterra  channel  used  in  the  simulations 


Figure  1:  MMSE  equalizer  performance,  m  =  G.  (a)  SER  vs.  SNR,  K  =  2000  symbols,  (b)  SER  vs.  sample  size  A,  SNR  =  24  dB. 


5.  CONCLUSIONS 

In  contrast  with  the  linear  channel  case,  for  equalizability  of  non¬ 
linear  channels  with  linear  FIR  filters  it  is  not  necessary  that  the 
channel  matrix  have  full  column  rank.  We  have  given  necessary 
and  sufficient  conditions  on  the  channel  matrix  for  this  property 
to  hold.  If  in  addition  the  input  symbol  sequence  is  i.i.d.  and  the 
memory  of  the  nonlinear  part  of  the  channel  is  strictly  shorter  than 
that  of  the  linear  part,  a  blind  algorithm  based  on  the  second-order 
statistics  of  the  channel  output  provides  the  equalizers. 
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ABSTRACT 

We  prove  that  a  MIMO  (multiple  input  multiple  output) 
blind  deconvolution  problem  for  n  colored  uncorrelated  sig¬ 
nals  can  be  converted  to  n  SIMO  (single  input  multiple  out¬ 
put)  problems,  using  eigenvalue  decomposition  of  a  special 
covariance  matrix,  depending  on  L-dimensional  parameter 
b,  if  appropriate  covariance  matrices  have  sets  of  eigenval¬ 
ues  with  empty  pairwise  intersection.  We  present  a  suffi¬ 
cient  condition  for  this  conversion  and  discuss  how  to  find 
such  parameters.  We  prove  that  the  parameters  b  for  which 
this  is  possible,  form  an  open  subset  of  IRL,  whose  comple¬ 
ment  has  a  Lebesgue  measure  zero. 

1.  INTRODUCTION 

The  problems  of  independent  component  analysis  (ICA), 
blind  source  separation  (BSS)  and  multichannel  blind  de- 
convolution  (MBD)  of  source  signals  have  received  wide 
attention  in  various  fields  such  as  biomedical  signal  anal¬ 
ysis  and  processing  (EEG,  MEG,  ECG),  geophysical  data 
processing,  data  mining,  speech  and  image  recognition  and 
enhancement  and  wireless  communications.  In  such  appli¬ 
cations  a  number  of  observations  are  available,  of  signals  or 
data  that  are  filtering  superposition  of  separate  signals  from 
different  independent  sources,  and  it  is  desired  to  process 
the  observations  so  that  the  outputs  correspond  to  the  sepa¬ 
rate  primary  source  signals. 

Acoustic  applications  include  the  signals  from  several 
microphones  in  a  sound  field  that  is  produced  by  several 
speakers  (the  so-called  cocktail-party  problem)  and  the  sig¬ 
nals  from  several  acoustic  transducers  in  an  underwater  sound 
field  from  the  engine  noises  of  several  ships  (sonar  prob¬ 
lem).  Radio  and  wireless  communication  examples  include 
the  observations  corresponding  to  outputs  of  array  antenna 
elements  in  response  to  several  transmitters,  and  the  obser¬ 
vations  may  also  include  the  effects  of  the  mutual  couplings 
of  the  elements.  Other  radio  communication  examples  arise 


in  the  use  of  polarization  multiplexing  in  microwave  links; 
the  maintenance  of  the  orthogonality  of  the  polarization  can¬ 
not  be  perfect  and  there  is  interference  between  the  separate 
transmissions.  Radar  examples  include  a  superposition  of 
signals  from  different  target  modulating  mechanisms  as  ob¬ 
served  by  multiple  receivers,  such  as  elements  sensitive  to 
different  polarizations. 

To  find  the  original  sound  source  that  was  recorded  with 
microphones  in  a  conference  room,  we  must  cancel  out,  or 
deconvolve,  the  room  impulse  response  from  the  original 
sound  source.  Since  we  have  no  prior  knowledge  of  what 
this  room  impulse  response  is,  we  call  this  process  the  mul¬ 
tichannel  blind  deconvolution  or  cocktail  party  problem. 

Most  of  the  existing  algorithms  for  MBD  assume  that 
source  signals  are  white  and  usually  the  additive  noise  is 
assumed  negligible  small  (see  for  instance  [4],  [8],  [9]). 

The  main  objective  of  this  paper  is  to  present  a  proce¬ 
dure  for  conversion  of  a  MIMO  deconvolution  problem  to 
several  SIMO  deconvolution  problems  in  presence  of  addi¬ 
tive  white  noise. 

We  note  that  another  idea  for  converting  a  MIMO  prob¬ 
lem  into  SIMO  problems  is  contained  in  [3].  We  refer  to  [7] 
and  references  therein  for  solving  SIMO  problems. 

Here  we  develop  an  idea  in  [5],  where  a  method  for 
deconvolution  of  colored  signals  is  presented,  using  matrix 
pencils. 

2.  PROBLEM  FORMULATION 

Consider  a  convoluted  mixture  x(fc)  =  (xi(k), ...,  xrn(kj) 
of  uncorrelated  colored  source  signals  Si(k),i  =  1, ...,  n 
with  m  >  n: 

M 

x(k)  =  '^2n(p)s(k-p)  +  n(k)  (1) 

p= 0 

or 

x(z)  =  H(z)s(z )  +  n(z)  (2) 
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where  H(c)  =  YlpLi  H (p)z~p.  We  assume  that  the  order 
M  of  the  £lters  is  known  or  can  be  estimated. 

The  problem  is  to  recover  the  original  signals  up  to  ar¬ 
bitrary  scaling,  permutation  and  delays. 

We  introduce  the  matrices  H;j  €  j^(a+i)-i-(a/+a+i)  by 


hij(0)hij(l)...  hij(M)  0... 

0  hij(  0) .  hij(M)... 


Hy  = 


0 

0 


hij(M) 


where  /iy  is  the  (i,  j)-th  element  of  the  matrix  H (p),  and 
the  matrix  H  g  IRm(JV+1  )*n(M+N+i)  by 


Hn  H12...  Hln 
H21  H22-.-  H2  n 


H  = 


H 


ml 


H 


m2* 


H„ 


Denote: 


data  x.  The  idea  is  to  use  time-delayed  correlation  matrices 
that  are  not  sensitive  to  additive  white  noise  and  construct 
a  positive  de£nite  matrix  from  their  linear  combination  (for 
sufficiently  large  number  of  samples),  a  problem  solved  in 
[1]  for  instanteneous  mixures  by  a  finite-step  global  conver¬ 
gence  algorithm  [10]. 

Let  us  define  a  a  time-delayed  correlation  matrix  of  the 

vector  x(k)  by 

Rx(p)  =  E{x(k)xT(k  -  p)}  (4) 

and  a  symmetric  matrix  R*(p)  by 

Ri(p)  =  i{Ri(p)  +  R £ (p)}.  (5) 

Similarly  we  define  analogous  matrices  Rg(p)  and  Rg(p) 
for  the  source  signals  s (k). 

The  time-delayed  correlation  matrices  of  the  observa¬ 
tion  vector  x(k)  for  any  p  7^  0  satisfy  (due  to  the  assump¬ 
tion  of  white  noise)  Rjt(p)  =  HR§(p)H7 . 

The  robust  orthogonalization  algorithm  can  be  summa¬ 
rized  as  follows. 

Algorithm  Outline:  Robust  Orthogonalization 


s i(k)  =  ( Si(k),Si(k  -  1),  ...,Si(k  -  M  -  N))T, 
s (k)  =  (si(k)T, ...,  sn(k)7  )T, 

Ki(k)  =  (xi(k),Xi(k  -  1 Xi(k  -  M))J , 
x(k)  =  (xx(k)T , ...,  xn(k)T)T . 
n i(k)  =  ( m(k),ni(k  -  1),  ...,rii(k  -  M))J , 
n(k)  =  (m(k)T ,  ...,nn(k)T)T . 

Then  the  convolution  problem  can  be  written  as 

x(k)  =  Hs(fc)  +  n  (k).  (3) 

Linder  the  following  assumptions  the  matrix  H  has  full 
column  rank  (see  for  instance  [2],  [8]  for  a  proof): 

(Hi)  H(z)  is  irreducible  (i.e.  rankH(e)  =  n,Vc  / 

0,z  =  +00); 

(H2)  H  (z)  is  column  reduced  (i.e.  its  highest  column- 
degree  coefficient  matrix  has  full  rank). 

The  above  assumptions  are  natural  and  used  in  many 
papers:  see,  for  example,  [4],  [5],  [8], 

3.  ROBUST  ORTHOGONALIZATION 

In  this  section  we  assume  that  H  is  nonsingular  square  ma¬ 
trix.  This  is  true,  if  we  have  freedom  to  choose  N  and  m 
such  that  m(N  +  1)  =  n(M  +  N  +  1)  and  (Hi),  (Ho)  to 
be  satisfied. 

We  use  a  preprocessing  procedure,  which  is  not  sensi¬ 
tive  to  the  white  noise  n(fc)  and  which  allows  us  to  de¬ 
fine  a  new  orthogonal  mixing  matrix  for  the  preprocesed 


1.  Find  (by  the  method  described  in  [1]),  i.e.  choose 
or  estimate  a  set  of  parameters  {aiJ/Lj  such  that  the 
matrix  C*(a)  =  Xwli  ctiRx(Pi)  is  positive  definite. 

2.  Perform  an  eigenvalue-decomposition  (EVD)  of  Cg(a), 
Cx(a)  =  UxA*U£,  where  the  entries  of  diagonal 
matrix  A*  are  the  positive  eigenvalues  of  Cg(a)  and 

compute  the  preprocessing  matrix  Q  =  A^  2  U?  . 

3.  Compute  the  preprocessed  data  z (k)  =  Qx(fc)  = 
QHs(A'). 

Remark  1  By  defining  a  new  mixing  matrix  as  A  = 
QHD2,  where  D  —  X!,=i  ct,Rg(pi)  is  a  block  diagonal 
positive  definite  matrix,  it  is  easy  to  show  that  Cg(a)  = 
AA7  =  I ,.((r  x  r)  unit  matrix,  r  =  m(N  +  1)),  so  A  is  or¬ 
thogonal.  This  orthogonality  condition  is  necessary  for  per¬ 
forming  conversion  to  SIMO  deconvolution  problems  using 
symmetric  EVD.  It  should  be  noted  that  in  contrast  to  the 
standard  prewhithening  procedure,  for  our  robust  orthogo¬ 
nalization  generally  £’{zzT}  ^  Ir.  Also,  we  have  z (k)  = 
As (k)  +  n (k),  where  s (k)  =  D~*s(k),  n(k)  =  Qn(fc), 
so  s (k)  are  filtered  (distorted)  versions  of  the  source  signals 

s  (k). 

Remark  2  It  is  easy  to  see  that  the  function  tpK  which 
assighs  to  every  a  e  RA  the  minimal  eigenvalue  of  the 
matrix  C*(a),  is  concave,  so  point  1  in  the  above  algo¬ 
rithm  can  be  realized  by  any  algorithm  which  searches  for  a 
maximum  (which  is  global)  of  pK-  The  robust  orthogonal¬ 
ization  is  possible,  if  the  maximum  value  of  p>K  is  positive 
(which  is  not  known  a  priori). 
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4.  EXTRACTION  OF  FILTERED  (DISTORTED) 
VERSIONS  OF  THE  INPUT  SIGNALS  BY  A 
SYMMETRIC  EIGENVALUE  PROBLEM 

In  this  section  we  assume  that  robust  orthogonalization  is 
possible  to  be  performed,  so  our  model  is  z (k)  =  As (k)  + 
n (k)  (n (k)  =  Qh(k)). 

De£ne  a  covariance  matrix  of  sensor  signals  by 
Rz(p)  =  E{z(k)z(k  -  p)T} 
and  similarly,  a  covariance  matrix  of  source  signals  by 
Rs(p)  =  E{s(k)s(k  -  p)T}. 

We  recall  that  the  source  signals  are  uncorrelated,  if 
R s(p)  are  diagonal  matrices  for  every  p.  If  the  source  sig¬ 
nals  are  statistically  independent,  then  this  condition  is  sat¬ 
isfied,  but  the  converse  assertion  is  not  always  true.  We  say 
that  the  sources  are  colored  if  for  some  po  >  1  the  matrix 
Rs  (po )  is  nonzero  (diagonal)  matrix. 

For  a  vector  b  e  IR/'  define 

L  L 

Z(b)  =  ^6pRz(p),  S(b)  =  ^6pR?(p).  (6) 

p=  i  p=  i 

Then 

Z(b)  =  AS(b)AT,  (7) 

and 

Z(b)  =  AdiagiS^b), ..,  Sn(b)}AT,  (8) 

where  Sj(b)  =  \  £p=1  bp(E{si(k)si(k-p)T}+E{si(k- 
p)s2(k)r})  e  jfcM+N+i+M+N+i  is  full  matrix  Note  that 
the  matrix  S;(b)  is  symmetric  and  each  diagonal  of  it  has 
equal  elements. 

Let  V (b)  be  a  set  of  n(M  +  N  + 1)  orthonormal  eigen¬ 
vectors  of  the  matrix  Z(b).  Denote  by  L,(b)  the  set  of  all 
eigenvalues  of  Sj(b)  and  by  v^b),  j  =  1  +  iV+l 

these  eigenvectors  in  V  (b),  which  correspond  to  the  eigen¬ 
values  from  the  set  L,(b). 

We  introduce  the  following  condition: 

L2(b)  nLj(b)  =  0  Vi  ^  j.  (DEV(b)) 

Theorem  1  Assume  that  condition  (DEV(b))  is  satis- 
fed  for  some  b  €  IR/'  and  the  noise  n  is  white.  Then  for 
any  i=  1  every  signal 

Vi,j(k)  =  Vjj(  b)Tz{k),j  =  1, ...,  M  +  N+  1 

is  a  sum  of  Titered  ( distorted )  versions  of  the  i-th  signal  s* 
plus  noise  riij(k)  =  vitj( b)Tn(k). 


Proof.  Let  Vjj(b)TA  =  u[j  n),  where  e 

1Rm+jv+i.  We  have 

Z(b)vjj(b)  =  \vitj(b), 

for  some  A  €  Lj(b).  Hence,  by  (7),  S(b)Arvii;,(b)  = 
AATVjj(b),  therefore,  by  (8),  S;(b)u2J-,,  =  XuhjJ.  By 
condition  (DEV (b))  we  obtain  u ijj  =  0  for  l  f  i,  there¬ 
fore,  for  every  j  =  1, ...,  M  +  N  +  1, 

Vi,j(k)  =  vi,j(  b)Tz(k)  =  u  fij4Si(k)  +  nitj(k). 

Since  the  components  of  the  vector  s ,(fc)  are  distorted  ver¬ 
sions  of  the  original  signal  sfk)  (see  Remark  1),  their  lin¬ 
ear  combinations  are  again  distorted  versions  of  the  original 
signal  Si(k),  so  the  theorem  is  proved.  ■ 

We  introduce  the  following  conditions  for  sources: 

Vi,  j  ^  i  3pij  >  1  : 

E{si(k)si(k  -Pij)  ±  E{sj(k)sj(k  -  pij)  (DAF) 

i.e.  the  sources  have  different  autocorrelation  functions. 

The  following  theorem  is  an  extension  (with  more  com¬ 
plicated  proof)  of  that  one  contained  in  [6],  which  considers 
instantaneous  mixtures. 

Theorem  2  Assume  that  condition  (DAF)  is  satisfed. 
Then  there  exists  L  such  that  the  condition  (DEV(b))  is 
satisfed  for  any  b  from  an  open  subset  B  C  !Ri,  whose 
complement  has  a  Lebesgue  measure  zero. 

Remark  3  The  correlation  matrices  E{z(k)z(k  —  p)} 
and  consequently  Z(b)  are  unbiased  by  the  additive  noise 
n (k)  under  condition  that  it  is  white  (i.i.d.)  and  independent 
from  the  source  signals. 

5.  EXTRACTION  OF  FILTERED  (DISTORTED) 
VERSIONS  OF  THE  INPUT  SIGNALS  BY  A 
GENERALIZED  EIGENVALUE  PROBLEM 

In  this  section  we  shall  consider  the  case  when  the  robust 
orthogonalization  is  not  possible,  so  either  the  matrix  H  is 
nonsquare,  or  the  functions  pK  (see  Remark  2)  has  non¬ 
positive  maximum  value. 

Let  V  (b,  c)  be  a  set  of  maximum  number  of  unit  lin¬ 
early  independent  generalized  eigenvectors  of  the  matrix 
pencil  (Z(b),Z(c)).  Denote  by  L, (b,  c)  the  set  (possibly 
empty)  of  all  generalized  eigenvalues  of  the  matrix  pen¬ 
cil  (Si(b),  S,(c))  and  by  v^b,  c),  j  =  the  set 

of  these  eigenvectors  in  V(b,  c),  which  correspond  to  the 
eigenvalues  from  L?  (b,  c). 

Theorem  3  Assume  that  the  condition 
Lj(b,c)nLj(b,c)=0  \fi^j  (DEV(b, c)) 
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is  satis£ed  for  some  vectors  b  e  IR7 ,  c  e  IRL.  Then  for 
any  i  =  1, n,  every  signal 

Vi,j(k)  =  Vi'j(b,c)7  z(k),j  =  1 

i's  a  sum  of  £ltered  ( distorted )  versions  of  the  i-th  signal  s, 
plus  noise  riij(k)  =  Vj.j(b,  c)Tn(k). 

Proof.  Let  vi,j(b,c)TH  =  (u£jU,  n),  where 

Ujj, /  €  IR.m+a'+1.  We  have 

Z(b)viij(b,  c)  =  AZ(c)vi,j(b,  c), 

for  some  A  €  Lj(b,  c).  Hence 

H(S(b)  -  AS(c))HTVjj(b,  c)  =  0. 

Since  H  has  full  column  rank, 

Si(b)uijj  =  AS,(c)u itjj  VZ  =  1, n. 

By  condition  (DEV(b))  we  obtain  u —  0  for  l  ^  i, 
therefore,  for  every  j  =  1, ....  to,, 

Vi,j(k)  =  vhj{b,c)Tz  (k)  =  uJjiSfk)  +  nitj(k), 

and  the  conclusion  follows  as  in  the  proof  of  Theorem  1 .  ■ 

Theorem  4  Assume  that  condition  (DAF)  is  satisied. 
Then  there  exists  L  such  that  the  condition  (DEV(b,  c)) 
is  satisied  for  any  (b,  c)  from  an  open  subset  B  c  IR27', 
whose  complement  has  a  Lebesgue  measure  zero. 

Remark  4  One  situation  when  we  can  check  whether 
the  condition  (DEV(b,  c))  is  satis£ed,  is  when  the  cor¬ 
responding  generalized  eigenvalues  of  the  matrix  pencil 
(Z(b),  Z(c))  are  distinct.  This  case  is  considered  in  [5] 
for  a  matrix  pencil  (for  single  delays,  so  in  our  presenta¬ 
tion  in  (6)  only  one  coefficient  bp  is  nonzero).  Our  condi¬ 
tion  (DEV(b,  c))  includes,  in  particular,  this  case.  When 
a  robust  orthogonalization  is  possible,  the  check  of  condi¬ 
tion  (DEV(b))  is  straitghforward,  and  due  to  Theorem  2 
(assuming  that  the  condition  (DAF)  is  satisfied),  we  can 
choose  randomly  vector  b  until  this  condition  is  satisfied. 

6.  CONCLUSIONS 

We  have  proved  that  a  MIMO  blind  deconvolution  prob¬ 
lem  can  be  converted  to  multiple  SIMO  blind  deconvolution 
problems  using  either  symmetric  EVD  (after  robust  ortog- 
onalization  when  it  is  possible),  or  generalized  eigenvalue 
problem  for  a  matrix  pencil.  If  we  have  freedom  to  choose 
large  number  of  observations  m  =  n(M  +  1),  under  some 
conditions,  this  conversion  is  possible  byamxm  matrix, 
where  n  is  the  number  of  the  observed  signals  and  M  is  the 
numbers  of  the  delays.  In  both  cases  our  method  is  robust 
to  additive  white  noise. 
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ABSTRACT 

We  consider  the  problem  of  identifying  a  Multiple-Input  Multiple- 
Output  (MIMO)  finite  impulse  response  system  excited  by  colored 
inputs  with  known  statistics.  Among  other  applications  this  prob¬ 
lem  appears  in  the  context  of  CDMA  communications  systems 
with  spatial  and  temporal  diversity.  We  propose  a  novel  approach 
that  optimizes  a  criterion  involving  spectra  and  cross-spectra  of 
the  system  output.  Simulation  results  indicate  that  the  proposed 
scheme  works  well,  even  for  large  order  systems,  and  is  robust  to 
noise  and  channel  length  mismatch. 

1.  INTRODUCTION 

The  blind  identification  ofamxn  Multiple-Input  Multiple-Output 
(MIMO)  system  is  of  great  importance  in  many  applications,  such 
as  communications,  biomedical  engineering,  seismology,  etc..  The 
goal  of  blind  system  identification  is  to  identify  an  unknown  sys¬ 
tem  H(z),  driven  by  n  unobservable  inputs,  based  on  the  rn  sys¬ 
tem  outputs  (n  <  m),  and  subsequently  use  the  system  estimate  to 
recover  the  input  signals  (sources). 

In  this  paper  we  deal  with  the  case  ofmxn  MIMO  system 
with  colored  inputs.  Many  of  the  existing  methods  address  the 
problem  using  higher-order  statistics  [8],  [10],  [12],  [14],  [4],  [2], 
There  are,  however,  a  few  methods  that  under  certain  conditions, 
address  the  problem  using  second-order  statistics  only  [6],  [11], 
[16],  [7],  [3].  Antenna-array  CDMA  system  with  spatial  and  tem¬ 
poral  diversity  can  be  formulated  as  MIMO  system,  where  the  sys¬ 
tem  describes  multipath  and  the  input  statistics  depend  on  the  user 
codes,  which  are  known.  Several  algorithms  have  been  proposed 
that  take  advantage  of  that  knowledge  [13],  [15],  [9]. 

Most  of  these  methods  are  based  on  the  time-domain  analysis 
and  depend  on  channel  length  information.  In  [3],  [5]  a  method 
was  proposed  that  uses  frequency  domain  second-order  correla¬ 
tions  to  recover  the  system  frequency  response  within  a  frequency 
dependent  phase  ambiguity  diagonal  matrix.  The  advantage  of  a 
frequency  domain  approach  for  channel  estimation  is  low  sensitiv¬ 
ity  to  channel  length  mismatch.  In  this  paper,  like  in  [5]  we  employ 
spectrum  and  cross-spectrum  operations,  but  rather  than  using  sin¬ 
gular  value  decomposition,  we  optimize  a  criterion  involving  the 
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aforementioned  quantities.  Our  experiments  indicated  that  the  pro¬ 
posed  approach  results  in  much  better  channel  estimates  in  terms 
of  overall  normalized  mean-square  error  (ONMSE),  while  it  does 
not  yield  phase  ambiguity. 

2.  PROBLEM  FORMULATION 

Let  us  consider  an  m  x  n  FIR  MIMO  system  with  colored  inputs. 
Let  e(k )  =  [ej  (k)  ■  ■  ■  en(k)}''  be  a  vector  of  n  statistically  inde¬ 
pendent  zero  mean  stationary  sources,  h(/)  the  impulse  response 
matrix  with  elements  {hij (()},  and  x(k)  =  [xi(fc)  •  •  ■xm(k)]T 
the  vector  of  observations.  Then,  the  MIMO  system  output  equals: 

L- 1 

x(fc)  =  X>«e(fc-0  (1) 

1=0 

where  L  is  the  length  of  the  longest  hij  ( k ),  and 

Lc-l 

ei(k)=Y:ci(l)si(k-l)  (2) 

1=0 

where  Si(k)  is  a  white  signal  with  unit  power,  and  Ci(k),  k  = 
0, ..,  Lc— 1  is  the  corresponding  color.  Lc  represents  the  maximum 
length  in  case  the  colors  have  different  lengths. 

For  the  quantities  shown  in  the  above  two  equations  we  will 
make  the  following  assumptions. 

(Al)  The  inputs  {sj(fc)}  are  unknown,  wide-sense  stationary  or 
cyclostationary,  temporally  white,  and  pairwise  uncorre¬ 
lated.  For  simplicity  we  will  assume  that  they  have  equal 
variances. 

(A2)  The  input  colors  are  known  and  pairwise  non-identical. 

(A3)  The  mixing  channels  hij  ( k )  are  in  general  complex. 

(A4)  Let  H(cj)  beam  x  n  matrix  whose  ij-th  element  is  the 
N -point  DFT  of  the  unknown  filter  hij  ( l ),  l  =  0, ....  L  —  1 
evaluated  at  frequency  u  =  jf-k,  where  k  takes  values  in 
[0, ...,  N  —  1].  We  will  assume  that  H(cj)  is  full  column 
rank  for  all  w’s. 
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The  ultimate  goal  of  blind  system  estimation/source  separa¬ 
tion  is  to  estimate  the  channel  matrix  and  use  the  estimate  to  sub¬ 
sequently  recover  the  input  sources. 

By  taking  the  length- Ar  DFT  (N  >  L)  of  Eq.(  1 ),  we  obtain  its 
frequency  domain  representation: 

x(w)  a  H (cu)e(cj)  (3) 

where  e(te)  is  the  Appoint  DFT  of  the  corresponding  segment 
of  e(n).  Here  u  denotes  denotes  discrete  frequency  of  the  form 
u  =  k  =  0, N  —  1. 

The  approximate  equality  above  would  be  replaced  with 
equality  if  the  sequence  e(k)  is  periodic  with  period  N. 

The  covariance  matrix  of  the  complex  stochastic  DFT  process 
x(u j)  equals: 

Rx(wi,w2)  =  £{x(a)i)x(a;2)W}  (4) 

=  H(wi)Rr(wi,w2)H(w2)W  (5) 

where  the  superscript  11  denotes  Hermitian  transpose,  and 

Rc(wi,o;2)  is  the  covariance  of  e(ui). 

Since  the  inputs  are  assumed  independent,  R,  (wi.w2)  is  di¬ 
agonal  matrix,  complex  in  general  except  for  uq  =  w2  when 
it  is  real.  Since  the  input  colors  are  assumed  known,  matrix 
Rr  (cj i ,  w2)  can  be  predetermined. 

Proposition  1:  Under  the  assumptions  (A  1  )-(A4),  the  channel 
matrix  H(cj)  can  be  reconstructed  up  to  a  complex  diagonal  matrix 
based  on  R.r(uq  w)  and  R3  (tc,w  +  a),  a  #  0. 

The  proof  of  this  proposition  can  be  found  in  [1].  It  is  impor¬ 
tant  to  note  that  the  residual  ambiguity  matrix  is  diagonal,  meaning 
that  the  sources  are  decoupled,  and  that  it  doesn’t  depend  on  fre¬ 
quency. 

In  the  next  section  we  propose  an  iterative  algorithm  for  blind 
identification  of  the  channel  matrix  H(w)  that  is  based  on  the  es¬ 
timates  of  R.t(w,w)  and  R;r(w,w  +  a). 

3.  PROPOSED  ALGORITHM 

Our  goal  is  to  determine  the  channel  matrix  H(aj)  by  using  the 
knowledge  of  R,  (w,o;)  and  Rc(o),o)  +  n)  and  estimates  of 
Rj.(u;,a;)  and  R.,.(w,w  +  a)  that  can  be  obtained  based  on  the 
system  outputs.  Let  us  consider  the  time  domain  representation  of 
the  channel  matrix  in  the  following  form: 

H(z_I)  =  h(0)  +  h(l)z_1  +  ...  +  h(L  -  1  )z~(/'~1)  (6) 

where  h(m)  =  {hij  (m)}.  Although  the  channel  length  L  appears 
in  (6),  as  it  will  demonstrated  in  the  simulations  part,  overestima¬ 
tion  of  L  is  not  very  critical. 

We  propose  an  iterative  method  for  obtaining  H(cj)  that  is 
based  on  minimizing  the  following  quantity: 

/V  — 1  N- 1 

m  =  53  H  Di  (*)  \&  +  Z  ii  1 )  i&  <7) 

A-= 0  A-=0 

where  Di(fc),  D2(fc;  l)  are  samples  of  Di(u),  D2(cu)  obtained  at 
ui  =  ^-k,  k  €  [0,  N  -  1],  with 

Di(fc)  =  Rx(k,k)  -H(k)Rc(k,k)H(k)N  (8) 

L  —  l L  —  l 

=  R x(k,k)  -  ^2^2h(m)R,(k,k)h{n)H e^j(m~n)\b) 

m=0  n=0 


D,(A-;/)  =  RT(k,k  +  l)  -H(k)Rr(k,k  +  l)H(k  +  l)tb) 

=  tlr(k,  k  +  /) 

L—l L—l 

m=0  ii—O 

where  ||.||f  denotes  the  Frobenius  norm  and  l  is  an  integer  in 
[0,  •  ■  • ,  Ar  —  1]  defined  as  cv  =  / . 

Let  us  denote  with  D, « (A-)  and  Di/(A:)  the  real  and  imag¬ 
inary  parts  of  Di(fc).  respectively,  and  similarly  with  D 2it(k;  l ) 
and  D2  i(k:l),  the  real  and  imaginary  parts  of  I)2(k;l),  respec¬ 
tively.  Then  we  can  write: 

A'  -  1 

r(I)  =  53Tr(D,(A-)Dr(fc))  +  Tr(D2(A;;l)D^(fc;0) 

k= 0 
A'-l 

=  ^  Tr(Dii?(A-)Djfi(fc)  +  D|,(A:)Df,(A:)) 

k= 0 

+  Tr(D2fl(A;  /)DL(A;  l )  +  D2/ (fc;  /  JDj,  {k;  l))(  12) 


The  derivative  of  T(l)  with  respect  to  h (?)  can  be  computed  in 
closed  form  f  1],  and  can  be  used  in  any  gradient  based  algorithm 
for  minimizing  Eq.(7).  In  our  experiments  the  steepest-descent 
method  was  used,  i.e.: 


h(i)*  +  1  =  h(0*  -  fH 


dm 

0h  (?’)*’ 


(13) 


where  h (i)k  denotes  the  updated  estimate  of  h(i.)  at  A-th  iteration 
and  //a-  is  the  step  size. 

Notice  that  not  all  the  frequencies  in  Eq.(7)  are  required  for 
successful  channel  reconstruction.  The  reconstruction  can  be 
based  on  as  few  as  2 L  DFT  samples  for  complex  channels.  Using 
fewer  frequencies  reduces  complexity  (the  complexity  of  the  algo¬ 
rithm  is  proportional  to  the  number  of  discrete  frequencies  used). 


4.  SIMULATION  RESULTS 

One  of  the  very  important  applications  that  can  be  studied  from 
the  point  of  view  of  MIMO  system  estimation  with  colored  inputs 
and  known  colors  is  an  antenna-array  CDMA  system.  If  the  sys¬ 
tem  outputs  are  taken  to  be  the  received  signals  sampled  at  the 
chip-rate,  then  the  system  inputs  are  oversampled  versions  of  the 
modulated  information-bearing  signals,  each  one  colored  by  the 
corresponding  user  spreading  code.  The  system  response  repre¬ 
sents  multipath  between  each  input  and  output  pair.  In  particular, 
for  an  n-user  CDMA  system  the  ?-th  receiver  baseband  signal  can 
be  described  by  the  following  equation: 

11  OO 

*,■(0  =  53  5Z  9»(t-IT,)aAl)  (14) 

J  =  1  l  =  —  o o 

where  j  is  the  user  index,  sj(l)  is  the  transmitted  symbol  se¬ 
quence.  and  Ts  is  the  symbol  duration.  For  user  j  each  sym¬ 
bol  is  multiplied  by  the  pre-assigned  spreading  code  sequence 
{cj(0),  •  •  • ,  Cj(Lc  —  1)}  at  Lr  times  the  symbol  frequency  Ts. 
The  signature  gij(t)  couples  the  j-th  user  with  the  i-th  receiver.  It 
incorporates  the  known  sequence  Cj  ( k )  and  the  unknown  channel 
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hij  ( t )  which  represents  the  multipath  fading  environment  between 
the  j-th  user  and  the  i-th  receiver  and  can  be  described  as: 


La 

9ij  (*)  =  ^2  hij  ~  mTc)cf  M  (15) 

m— 1 


The  i-th  receiver  baseband  signal  xft )  is  sampled  at  the  chip 
rate  1/TC  to  obtain  the  following  discrete  time  system: 

n  oo 

Xi(k)  =  ^  ^  ^  ^  9ij(k  —  lLc)sj(l)  (16) 

j= 1 1=— oo 
Lc~  1 

9ij  (*)  =  ^2  hij  (k  -  m)Cj  ( 17) 

m=0 


as: 


It  can  easily  be  shown  that  the  last  expression  can  be  rewritten 

n  k 

Xi(k)  =  ^2  hij{k  -  m)ej{m)  (18) 

j— 1  m=k  —  L 


where  the  process  e3  ( k )  can  be  viewed  as  the  convolution  between 
Cj  (k)  and  an  oversampled  by  Lc  version  of  Sj(k). 

In  this  section  we  will  apply  the  proposed  algorithm  on  the  an¬ 
tenna  array  CDMA  system  and  analyze  its  performances  regarding 
both  the  system  identification  and  channel  equalization. 


Figure  2  shows  the  performance  of  the  algorithm  for  3  differ¬ 
ent  data  lengths  and  various  SNRs.  Results  are  based  on  the  50 
Monte  Carlo  runs  and  averaged  over  10  different  5x5  channel 
realizations. 

In  order  to  show  the  robustness  of  the  proposed  algorithm  on 
channel  order  mismatch  we  computed  the  ONMSE  for  the  same 
example  (5x5  system  with  4  -  QAM  inputs,  SNR  =  10dB, 
M  =  2048  and  20  different  channel  sets  with  L  =  5)  for  different 
amount  of  length  mismatch.  The  assumed  channel  lengths,  La , 
were  5,  6  and  7.  The  results  based  on  the  50  Monte  Carlo  runs  are 
shown  in  Fig.  3. 

4.2.  Equalization 

Based  on  the  obtained  system  estimate  a  zero-forcing  block  linear 
equalizer  was  used  to  recover  the  inputs.  Let  us  denote  with  s  the 
combined  data  symbol  vector: 


S  =  [sf,S2  ,• 

■STn] 

(20) 

-[•MV 

-,s\P)]T 

(21) 

where  P  is  the  number  of  symbols  per  user. 

Let  Xj,  i  —  1,  •  •  • ,  m  be  the  data  vector  at  the  ith  receiver.  It 
is  easy  to  show  that  the  following  expression  holds: 

x,  =  A(,)s  +  n;,  i  =  (22) 


4.1.  System  Reconstruction 

We  considered  a  5-user  5-antenna  CDMA  system  with  4-level 
QAM  inputs.  The  channels  were  generated  according  to  the  com¬ 
plex  Gaussian  distribution  with  zero  means  and  unit  variances,  and 
normalized  with  respect  to  the  zero-delay  component.  The  num¬ 
ber  of  multipaths  was  selected  to  be  L  =  5  (5  chip  intervals  long). 
The  spreading  codes  were  taken  to  be  random  sequences  of  length 
Lc  =  16.  The  number  of  samples  used  was  M  —  4096  (256 
symbols),  the  DFT  size  was  N  =  128  and  the  signal  to  noise  ra¬ 
tio  was  selected  to  be  SNR=10dB.  The  estimates  R,(7c.  k)  and 
R x(k,  k  +  1)  were  obtained  by  segmenting  the  received  data  into 
jf  segments,  computing  the  DFT  of  each  segment  and  averaging 
over  all  segments  as  in  Eq.(4).  The  number  of  frequencies  used  for 
the  optimization  was  F  —  32.  The  frequency  spacing  used  was 
l  =  8.  The  selection  of  the  frequency  spacing  plays  an  important 
role.  As  it  has  already  been  discussed,  Re  (k,  k)  and  Re  {k,  k  +  l) 
are  assumed  to  be  known  for  all  frequencies  k  =  0,  ■■■,  N  —  1. 
However,  since  the  number  of  DFT  points,  N,  is  in  general  larger 
than  the  length  of  the  colors  Lc,  the  resolution  doesn't  allow  us  to 
use  any  frequency  spacing  l  with  the  same  accuracy.  By  selecting 
l  =  jj-  this  problem  is  successfully  resolved. 

As  the  measure  of  the  performance,  the  normalized  mean- 
square  error  ( NMSE )  was  used.  The  overall  (ONMSE)  was 
obtained  by  averaging  over  all  cross-channels: 


ONMSE  = 


E”  i  ZU  NMSEi3 
mn 


(19) 


The  simulations  were  repeated  for  the  20  randomly  selected 
5x5  channel  realizations,  for  various  data  lengths  and  SNRs. 
For  each  channel  set  ONMSE  was  computed  based  on  50  inde¬ 
pendent  input  realizations.  The  ONMSEs  corresponding  to  the 
20  different  channels  are  shown  in  Fig.  1 . 


where  n,  is  the  noisy  vector  at  the  ith  receiver  and  =  {A^} 
is  the  ( PLC  +  L  —  1)  x  (nP)  matrix  defined  as: 

qi=l,---,P 

9iqs  (92 ) ;  <22  =  1,  •  •  •  ,  Lc  +  L  —  1  (23) 

93  =  l,-,n 
0,  elsewhere 

where  gij(k)  was  defined  in  Eq.(17). 

The  zero-forcing  block  linear  equalizer  can  now  be  imple¬ 
mented  as  (assuming  Rn;  =  E{mnf }  =  ernI): 

i  m 

szf  =  —  ^(A(i)HA(i))_1A(i)HXi  (24) 

i=l 

For  a  5  x  5  system,  typical  signal  at  the  output  of  the  equalizer 
for  4  —  QAM  inputs.  SNR  =  10 dB,  L  =  5  and  data  length 
M  =  4096  is  shown  in  Fig.  4.  The  bit-error  rate  ( BER )  of  the 
recovered  signal  is  shown  in  Fig.  5.  The  results  were  obtained 
based  on  50  Monte  Carlo  runs  for  two  different  data  lengths  used 
for  system  identification.  Solid  lines  correspond  to  the  case  with 
no  mismatch  ( La  —  L  =  5),  while  dashed  fines  represent  the  case 
when  La  =  7. 
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Figure  3:  ONMSE  for  different  channel  order  mismatches 
(L„  =  5  is  the  true  length) 


Figure  1:  ONM SE  for  different  data  lengths  and  SNR  =  10 dB 


Figure  2:  ONM  SE  for  different  SNRs  and  data  lengths 


Figure  4:  The  output  of  the  equalizer  for  4  —  QAM  input  signals, 
SNR  =  10dB  and  data  length  M  —  409G 


Figure  5:  BER  corresponding  to  a  4-level  QAM  inputs 
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ABSTRACT 

The  paper  introduces  a  robust  approach  to  subspace  based  blind 
channel  identification.  The  technique  is  based  on  estimating  the 
noise  subspace  from  the  sample  sign  covariance  matrix.  The  theo¬ 
retical  motivation  for  the  technique  is  shown  under  the  white  Gaus¬ 
sian  noise  assumption.  A  simulation  study  is  performed  to  demon- 
sttate  the  robust  performance  of  the  algorithm  both  in  Gaussian 
and  non-Gaussian  noise.  The  results  indicate  that  when  the  noise 
is  Gaussian,  the  proposed  method  has  similar  good  performance 
as  the  standard  subspace  method.  When  the  noise  is  heavy-tailed, 
the  proposed  method  outperforms  the  conventional  subspace  tech¬ 
nique. 

1.  INTRODUCTION 

Blind  channel  identification  allows  for  improving  spectral  efficiency. 
It  may  be  achieved  using  only  second  order  statistics  by  employ¬ 
ing  Single-Input  Multi-Output  (SIMO)  model  resulting  from  frac¬ 
tional  sampling  or  the  use  of  an  antenna  array  [1],  A  subspace 
method  performing  the  identification  of  the  channel  from  the  eigen¬ 
value  decomposition  of  the  covariance  matrix  was  proposed  in  [2]. 

Typically  noise  in  the  received  signal  are  assumed  to  be  spa¬ 
tially  and  temporally  white  Gaussian  noise  and  the  eigenvalues 
and  corresponding  noise  subspace  eigenvectors,  needed  in  chan¬ 
nel  identification,  are  computed  from  the  sample  covariance  ma¬ 
trix.  Sample  covariance  matrix  is  known  to  perform  poorly  in  the 
face  of  heavy-tailed  noise.  This  is  of  concern  in  wireless  com¬ 
munication  applications,  in  particular  in  urban  and  indoor  radio 
channels,  where  the  ambient  noise  has  been  shown  to  be  decidedly 
non-Gaussian  [3],  Consequently,  the  estimated  eigenvectors  and 
eigenvalues  may  significantly  deviate  from  the  true  ones. 

In  this  paper,  we  propose  a  robust  subspace  identification 
method  that  performs  almost  optimally  in  Gaussian  noise  and  highly 
reliably  in  non-Gaussian  heavy-tailed  noise.  The  sample  covari¬ 
ance  matrix  used  in  [2]  is  replaced  by  a  sample  sign  covariance 
matrix,  which  uses  a  multivariate  generalization  of  the  univari¬ 
ate  sign  function.  Theoretical  motivation  of  the  method  is  shown 
under  the  white  Gaussian  noise  assumption.  The  simulation  re¬ 
sults  demonstrate  that  the  performance  is  almost  equal  to  that  of 
the  original  method  in  Gaussian  noise,  and  it  remains  highly  reli¬ 
able  even  in  heavy-tailed  noise  such  as  Cauchy.  The  performance 
of  the  original  method  deteriorates  significantly  and  it  may  com¬ 
pletely  fail  in  such  noise  conditions.  The  additional  robustness  of 
the  proposed  method  is  achieved  without  any  significant  increase 
in  computational  complexity. 

Financial  support  for  this  work  was  provided  by  the  Academy  of  Fin¬ 
land 


The  paper  is  organized  as  follows.  The  sample  sign  covariance 
matrix  to  be  employed  in  blind  identification  is  defined  in  section 

2.  Then  the  signal  model  used  in  SIMO  model  is  given  in  section  3. 
Section  4  briefly  describes  the  original  blind  subspace  identifica¬ 
tion  method  by  Moulines  et  al.  [2].  The  method  is  based  on  noise 
subspace  eigenvectors.  In  section  5  we  show  how  these  eigenvec¬ 
tors  may  be  estimated  in  a  robust  manner,  hence  yielding  robust 
estimates  of  the  channel  coefficients.  In  section  6,  simulation  ex¬ 
amples  where  the  received  signal  is  contaminated  by  Gaussian  and 
heavy-tailed  noise  are  presented.  Finally,  section  7  concludes  the 
paper. 

2.  SIGN  COVARIANCE  MATRIX 

We  begin  by  defining  the  sample  Sign  Covariance  Matrix  (SCM) 
used  in  this  article.  For  a  M -variate  complex  vector  x,  the  spatial 
sign  function  is  defined  as 


where  ||x||  =  (xwx)1/2.  For  a  M-variate  complex  data  set, 
xi , . . . ,  x/( ,  the  sample  SCM  is 

Si  =  ^Es(x>)s"(x*)' 

i=i 

The  (theoretical)  SCM  for  the  distribution  F  is  defined  by 
Si  =£F{S(x)SH(x)}, 

where  x  is  distributed  according  to  F.  Various  properties  of  the 
SCM  have  been  discussed  in  [4,  5]. 

3.  SIGNAL  MODEL 

Let  sn  be  a  symbol  emitted  at  the  time  nT,  where  T  is  the  symbol 
duration.  We  assume  the  standard  SIMO  baseband  signal  model 
[6],  in  which  the  received  signal  x„  having  P  components  ar¬ 
ranged  as  a  column  vector  is  of  the  form 

L 

Xn  =  4-  V„.  (1) 

k= 0 

Here  {hi-}  is  the  channel  impulse  response  sequence,  L  is  the 
channel  order  and  vn  is  the  noise.  This  SIMO  model  result  ei¬ 
ther  by  sampling  the  received  signal  from  P  sensors  by  a  symbol 
rate  or  by  oversampling  the  signal  received  by  a  single  sensor  by  a 
factor  A  =  T/P  [1], 
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By  stacking  N  +  1  observations  of  ( 1 )  into  an  NP  x  1  vector 
X„  xLa,]t  we  may  write 

xn  =nKs„  +v„.  (2) 

Here,  S„  =  [s«,  s„-i ,  ■  •  ■ ,  Sn-jv-t,]7', 

Vn  =  [vJ,v’’_1,...,vLn]7’  andHjv  is  the  (A'  +  l)Px(I  + 


N  +  1 )  channel 

convolution  matrix  given  by 

'h0 

hi 

h/. 

0 

0 

%N  — 

0 

ho  hi 

hr. 

0 

0 

_0 

0 

ho 

h, 

ht 

4.  CHANNEL  IDENTIFICATION 


We  now  review  the  basic  steps  of  a  subspace-based  identification 
method  first  introduced  in  [2],  Assume  that  x„  given  in  (2)  is  a 
wide-sense  stationary  process  and  the  signal  S„  and  noise  V„  are 
mutually  independent.  The  covariance  matrix  of  Xn  is 

£{X„X"}  =  E0  =  WwE,Wj?  +E,., 

where  E.,  =  ,E{S„  S'J  }  is  the  signal  covariance  matrix  and  E,.  = 
£{V„V,f  }  is  the  noise  covariance  matrix.  We  assume  that  the 
standard  assumptions  hold  (see  [2]  for  detail)  for  ensuring  the  chan¬ 
nel  identifiability.  These  assumptions  require  that  the  subchannels 
resulting  from  forming  the  SIMO  model  do  not  share  common  ze¬ 
ros.  The  transmitted  symbols  are  assumed  to  be  i.i.d.  between 
the  successive  time  instants  and  the  noise  covariance  matrix  is  as¬ 
sumed  to  be  E,.  =  o 2 1,  where  a2  is  the  noise  power.  The  maxi¬ 
mum  channel  order  L  is  assumed  to  be  known  as  well. 

The  covariance  matrix  Eo  of  the  received  signal  can  be  rep¬ 
resented  in  terms  of  its  eigenvector  decomposition.  Based  on  the 
pattern  of  the  eigenvalues  one  can  perform  the  decomposition  to 
signal  and  noise  subspaces.  The  signal  subspace  spanned  by  eigen¬ 
vectors  corresponding  to  the  L  +  N  +  1  largest  eigenvalues  spans 
the  same  space  as  columns  of  the  channel  matrix  Hk.  The  re¬ 
maining  r  =  (P  —  1)N  +  P  —  L  —  1  eigenvectors  span  the 
noise  subspace.  The  corresponding  eigenvalues  are  all  equal  to 
the  noise  variance  a2.  Denote  the  noise  subspace  eigenvectors  by 
g, ,  i  =  1 .....  r.  It  is  a  standard  result  that 

HAg,  =0,  i  =  1,. ..  ,r. 


This  orthogonality  of  the  signal  and  noise  subspaces  allows  for 
identification  of  the  channel  coefficient  vector 


h  =  [hZ\hf,. 


To  illustrate  how  the  identification  is  done,  partition  the  noise  sub¬ 
space  eigenvectors  as 


g<  =  [go0  >gl 


(0J 


>gw 


(3) 


where  g  ['\k  =  0, 1, . . . ,  N  are  of  size  P  x  1.  Define 


Gi  = 


g{° 

0  g'°  gi" 


g  (N 


g !° 


o  g<"  gi" 


gi"j 


(4) 


It  can  be  shown  [2]  that 

g"WA-w"gi  =  hHgig!lh,  i  =  i,...,r. 


Therefore 


h" 


h  =  0. 


In  [2]  it  is  also  shown  that  the  dimension  of  the  null  space  of  the 
matrix 

c=ii 

i=  1 


is  one.  This  implies  that  the  channel  impulse  response  may  be 
determined  from  the  eigenvector  of  C  corresponding  to  the  eigen¬ 
value  that  is  equal  to  zero,  and  the  solution  is  unique  up  to  a  multi¬ 
plicative  constant.  Signal  subspace  eigenvectors  may  be  employed 
as  well  [2]. 


5.  ROBUST  SUBSPACE  ESTIMATION 

In  practice  the  noise  subspace  eigenvectors  have  to  be  estimated 
from  the  available  measurements  Xi , . . . .  XA.  The  estimation  is 
conventionally  done  by  using  the  eigenvectors  of  the  sample  co- 
variance  matrix 

So  =  -L£x,x". 

i=  1 

Let  g,.0)  i  =  1,  •  •  •  ,  r  be  eigenvectors  of  So  corresponding  to  the 
r  smallest  eigenvalues.  An  estimate  of  the  channel  vector  may  then 
be  chosen  to  be  the  eigenvector  corresponding  to  smallest  eigen¬ 
value  of 

V 

C  =  Y,G&',  (5) 

i=i 

where  Q,  are  defined  from  equations  (3)-(4)  with  g,.o  used  in  place 
of  gi. 

Let  g,  .i ,  i  =  1 , . . . ,  r  be  the  eigenvectors  corresponding  to 
r  smallest  eigenvectors  of  the  sample  SCM.  We  now  prove,  as¬ 
suming  Gaussian  noise,  that  these  eigenvectors  are  convergent  es¬ 
timates  of  the  noise  subspace  basis  vectors.  Therefore  they  may  be 
used  in  any  subspace  based  identification  method. 

Theorem  1  Assume  Xi , . . . ,  X/,  distributed  as  given  in  (2)  and 
assume  that  the  SIMO  identifiability  conditions  hold.  Assume  fur¬ 
ther  that  the  multivariate  noise  in  (2)  is  complex  circular  Gaussian 
distributed  and  denote  the  SCM  of'K., s  by  Ei.  Let  Si  be  the  sam¬ 
ple  SCM  of  the  data.  Set  g,,i ,  i  =  1, ... ,  r  to  be  the  eigenvectors 
of  .S' i  corresponding  to  r  smallest  eigenvalues.  Then: 

(i)  The  r  smallest  eigenvalues  o/Ei  =  E{S\  }  are  equal  and 
the  corresponding  eigenvectors  are  orthogonal  to  the  columns 
of  the  matrix  TLn. 

(ii)  As  K  —t  oo, 

S,  u,4‘  E,. 


(Hi)  As  K  — >  oc. 


n% g,.i  *4‘0,  i  =  1, . . .  ,r. 

Proof.  Result  (i)  follows  from  Theorem  2  in  [4],  By  using  the 
i.i.d.  assumption  of  the  symbol  sequence  the  result  (ii)  follows 
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from  Theorem  1.8.E  in  [7].  Result  (iii)  now  follows  from  Theorem 
3  in  [4], 

Note  that  the  part  (i)  of  the  above  theorem  proves  that  the  chan¬ 
nel  coefficient  vector  may  be  identified  from  the  theoretical  SCM 
of  X  in  (2).  Part  (ii)  then  gives  the  convergence  of  the  sample  SCM 
to  the  theoretical  SCM.  Finally,  part  (iii)  states  the  convergence  of 
the  noise  subspace  eigenvectors.  The  efficiency  and  robust  perfor¬ 
mance  of  the  sample  SCM  based  subspace  estimation  technique, 
also  in  non-Gaussian  noise,  is  shown  using  simulations  in  the  fol¬ 
lowing  section. 


6.  SIMULATION  RESULTS 

In  this  section,  we  present  simulation  results  illustrating  the  robust¬ 
ness  of  the  channel  identification  using  the  noise  subspace  estimate 
obtained  from  the  sample  SCM.  Moreover,  we  compare  the  per¬ 
formance  to  that  of  the  identification  method  where  the  noise  sub¬ 
space  estimate  is  obtained  from  the  sample  covariance  matrix.  The 
channel  is  estimated  using  the  noise  subspace  method  described 
earlier.  In  order  to  study  robustness,  e-contamination  and  complex 
isotropic  symmetric  a-stable  ( SaS )  noise  models  are  considered. 

The  characteristic  function  of  a  complex  isotropic  SaS  distri¬ 
bution  is 

pH  =  exp(-7|w|a). 

The  smaller  the  characteristic  exponent  a  €  [0,  2],  the  heavier 
the  tails  of  the  density  (the  case  a  =  2  corresponds  to  Gaussian 
distribution).  The  positive  valued  scalar  7  is  the  dispersion  of  the 
distribution.  The  dispersion  plays  a  role  analogous  to  that  of  the 
variance  for  second  order  processes  [8]. 

In  the  e-contaminated  noise  model,  the  noise  is  given  by 

v  =  (1  —  b)v  1  +  bv2 

where  b  ~  Bin( l,e),  vi  ~  A/c(0, 1),  v2  ~  Afc(0 , 1000). 

As  in  Moulines  et  al.  [2],  the  emitted  signal  is  a  random  4- 
QAM  signal  and  the  symbols  are  independent  between  the  succes¬ 
sive  time  instants.  The  noise  is  independent  of  the  signals  and  i.i.d. 
between  the  samples.  The  number  of  virtual  channels  is  P  =  4; 
the  width  of  the  temporal  window  is  IV  =  10;  the  degree  of  the  ISI 
is  L  =  4.  The  channel  coefficients  are  given  by  [2] 

hj  =  [(—0.049  4-  0.359/),  (0.443  —  0.0364;), 

(-0.221  -  0.322;),  (0.417  +  0.030;)] 

hf  =  [(0.482  -  0.569;),  (1),  (-0.199  + 0.918;),  (1)] 

=  [(-0.556  +  0.587;),  (0.921  -  0.194;),  (1), 

(0.873  +  0.145;)] 

hi  =  [(1),  (0.189  -  0.208;),  (-0.284  -  0.524;'), 

(0.285  +  0.309;)] 

h!  =  [(-0.171 +0.061;),  (-0.087  -  0.054;), 

(0.136  -  0.19;),  (-0.049  +  0.161;)] 

The  number  of  independent  Monte-Carlo  runs  used  in  the  sim¬ 
ulations  is  100.  Since  the  correct  channel  vector  can  be  estimated 
only  up  to  an  arbitrary  multiplicative  constant,  the  performance 
criterion  used  in  our  simulations  is  the  canonical  angle  between 
the  estimated  channel  vector  h  and  the  correct  channel  vector  h. 


MEAN-SQUARE  ERROR  OF  THE  CANONICAL  ANGLE 


Figure  1:  MSE  of  the  canonical  angle  (in  radians)  in  e- 
contaminated  noise.  Solid  lines:  noise  subspace  estimated  from 
the  sample  covariance  matrix.  Dashed  lines:  noise  subspace  esti¬ 
mated  from  the  sample  SCM. 


The  canonical  angle  is  defined  as 


<t(h,  h)  =  acos 


where  1 1  - 1 1  is  the  Euclidean  vector  norm.  Note  that  0  <  <(h,  h)  < 
7r/2.  Moreover,  <(h,  h)  =  0  if  and  only  if  h  =  ch,  where  c  is  a 
scalar  constant. 

In  our  first  simulation  we  compare  the  behavior  of  the  two 
algorithms  in  e-contaminated  noise.  The  output  SNR  (as  defined 
in  [9])  between  the  signal  part  and  the  nominal  noise  part  v\  is 
20  dB.  Figure  1  shows  mean  squared  error  of  the  canonical  angle 
for  cases  e  =  0,e  =  0.05  and  e  =  0.1  and  number  of  symbols 
Nd  =  250,  500, 1000,  2000,  5000.  The  MSE  is  calculated  by 

,  1 Vm 

MSE=ivmE<(h^>)2’ 

i=l 

where  Nm  is  the  number  of  Monte-Carlo  realizations  and  h,  is  the 
estimate  from  ith  realization.  In  the  Gaussian  case,  the  behavior 
of  the  two  methods  is  in  practice  equal  for  Nd  >  2000.  For  small 
sample  sizes  the  method  based  on  sample  covariance  has  smaller 
MSE.  As  expected,  when  e  >  0.  the  method  based  on  the  sample 
SCM  has  better  performance  than  the  method  based  on  the  sample 
covariance  matrix. 

Figure  2  shows  the  simulation  results  for  a-stable  noise.  The 
values  used  for  the  characteristic  exponent  are  a  =  2,  a  =  1.5 
and  a  —  1.  The  value  for  the  dispersion  is  7  =  1  (in  the  Gaus¬ 
sian  case  the  output  SNR  is  20  dB).  Similarly  to  the  previous 
simulation  the  number  of  symbols  used  in  the  estimation  task  is 
Nd  =  250,  500, 1000,  2000,  5000.  Naturally  the  simulation  re¬ 
sults  for  Gaussian  noise  are  the  same  as  in  the  previous  simula¬ 
tion.  When  the  noise  is  more  heavy  tailed  than  Gaussian  noise,  the 
method  based  on  the  SCM  clearly  outperforms  the  method  based 
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MEAN  SQUARE  ERROR  OF  THE  CANONICAL  ANGLE 


NUMBER  OF  SYMBOLS 


Figure  2:  MSE  of  the  canonical  angle  (in  radians)  in  o-stable 
noise.  Solid  lines:  noise  subspace  estimated  from  the  sample  co- 
variance  matrix.  Dashed  lines:  noise  subspace  estimated  from  the 
sample  SCM. 
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on  the  covariance  matrix.  Note  that  when  o  <  2  the  probability 
of  having  extremely  deviating  noise  samples  in  data  grows  as  a 
function  of  the  number  of  samples  Nd.  Therefore  also  the  MSE 
of  the  method  employing  the  sample  covariance  matrix  grows  as  a 
function  of  Nj . 


7.  CONCLUSION 

In  the  paper  we  show  how  blind  channel  identification  may  be 
done  in  a  robust  manner  by  using  the  sample  SCM.  The  simula¬ 
tion  results  imply  that  the  proposed  method  performs  reliably  also 
in  heavy-tailed  noise,  whereas  the  method  based  on  the  sample  co- 
variance  matrix  is  sensitive  to  the  deviations  from  Gaussian  noise. 
The  calculation  of  the  sample  SCM  is  straightforward  and  there¬ 
fore  the  methods  based  on  the  SCM  have  approximately  the  same 
computational  complexity  as  the  methods  based  on  the  sample  co- 
variance  matrix. 
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ABSTRACT 

This  contribution  deals  with  a  particular  family  of  blind  sys¬ 
tem  identification  techniques,  referred  to  as  Minimum  Noise 
Subspace  (MNS)  method.  MNS  method  is  a  computation¬ 
ally  fast  version  of  Subspace  method.  Here,  we  develop 
an  orthogonal  version  of  MNS  method.  Orthogonal  Mini¬ 
mum  Subspace  (OMNS)  method  is  more  efficient  in  compu¬ 
tation  than  a  standard  subspace  method,  and  is  more  robust 
to  channel  noise  than  MNS. 

I.  INTRODUCTION 

Recently,  a  new  subspace  method  called  Minimum  Noise 
Subspace  (MNS)  has  been  proposed  for  Multiple-Input 
Multiple-Output  (MIMO)  system  identification  [1],  [2], 
This  method  computes  the  noise  subspace  via  a  set  of  noise 
vectors  which  are  computed  in  parallel  from  a  set  of  com¬ 
binations  of  system  outputs  that  form  a  basis  of  the  rational 
noise  subspace. 

In  this  contribution,  an  orthogonal  version  of  MNS  called 
Orthogonal  Minimum  Noise  Subspace  (OMNS)  is  pro¬ 
posed.  Here,  the  noise  subspace  is  formed  through  compu¬ 
tation  of  noise  vectors  that  correspond  to  an  orthogonal  set 
of  noise  polynomial  vectors  (orthogonal  basis  of  the  ratio¬ 
nal  noise  subspace).  The  OMNS  is  more  robust  to  channel 
noise  than  MNS  method. 

This  paper  is  organized  as  follows:  System  model  and 
general  assumptions  are  introduced  in  section  II.  Section 
III  describes  the  general  subspace  method.  In  section  IV 
we  derive  the  basic  ideas  of  both  MNS  and  OMNS  method 
applying  rational  subspace  formalism.  OMNS  algorithm  is 
described  in  section  V.  In  section  VI,  both  methods  are 
compared  in  terms  of  computation  complexity  and  estima¬ 
tion  accuracy.  In  section  VII,  the  computer  simulations  are 
presented. 

II.  SYSTEM  MODEL 

Let  y(n)  be  a  q- variate  discrete  time  stationary  time  se¬ 
ries  given  by: 

M 

y(n)  =  H(fc)s(n  —  k)  +  w (n)  =  [H(z)]s(n)  -1-  w (n) 

k= 0 

0) 


where 

m  r  hiAz)  •••  h,P(z) " 

H(z)  =  £h(*)z-*4  :  : 

k=°  .  hq,l(Z)  •••  hq,p(z)  . 

H(z)  is  an  unknown  causal  FIR  q  x  p  transfer  function  with 
q  >  p.  s (n)  =  [si(n),  ...,sp(n)]T  is  a  p-dimensional  un¬ 
known  process  and  w(n)  is  an  additive  (/-dimensional  white 
noise,  i.e.  i?[w(n)w*(n)]  =  cr2 Iq.  Otherwise,  (1)  de¬ 
scribes  a  p-input  and  (/-output  system. 

In  the  communication  context,  the  input  sequence  s(n) 
denotes  the  transmitted  symbols,  and  the  unknown  FIR 
transfer  function  H(z)  models  the  propagation  channel  be¬ 
tween  sources  and  sensors. 

We  study  here  the  estimation  of  H(z)  from  the  observa¬ 
tion  y(n)  under  the  following  assumptions: 

rank(H(z))  =  p  for  each  z  (2) 

H(M)  is  full  column  rank  (3) 

In  fact,  (3)  can  be  relaxed  by  simply  assuming  H(z)  to  be 
column-reduced  [5]. 

III.  SUBSPACE  METHOD 

Here,  we  present  a  brief  review  of  original  subspace 
method.  Let  y ,(n)  be  a  vector  of  N  successive  samples 
from  the  i-th  output  of  the  system.  According  to  (1),  it  can 
be  written  as: 

y i(n)  =  [yi{n),...,yi{n  -  N  +  1)]T 

=  7/v(Hii:)s(n)  +  Wi(n)  (4) 

s(n)  denotes  the  vector  of  input  samples,  i.e.  s(n)  = 
[sf(n),...,sj(n)]r  where  Sj(n)  =  [sj(n),...,Sj{n  -  N  - 
M  +  1)]T  for  1  <  j  <  p  and  w/(n)  =  [w,:(n),  ...,Wj(n  - 
N  +  1)]T.  T/v(H; .)  is  the  N  x  p(N  +  M)  block  Sylvester 
matrix  given  by: 

TN(Hw)  =  [TN(hiyl),...,TN(hitP)] 

where,  Tn^j)  denotes  the  N  x  (N  +  M)  Sylvester  ma¬ 
trix  associated  to  hij  [3].  Considering  all  of  the  outputs  of 
the  system  and  putting  them  in  to  a  vector  called  y  (n),  we 
obtain: 

y(n)  =  [yf(n),...,y»f 

=  7/v(H)s(n)  +  w(n)  (5) 
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with 

w(n)  =  [wf  (??.),  ...,wl(n)]T 

Tn(  H)  =  [7^(H1,),...,7^(H,,)]t  (6) 

7at(H)  is  qN  x  p(N  +  M)  generalized  Sylvester  matrix 
of  order  N  associated  to  H (z).  Let  R\  be  the  qN  x  qN 
covariance  matrix  of  y(n): 

Rjv  =  S[y(rr)y*(n)] 

=  Tn(  H)S7^(H)+(t%/v  (7) 

where  S  =  E[s(ra)s*  (n)]  >  0  under  assumptions  (2)  and 
(3)  for  N  >  pM,  7jy(H)  has  full  column  rank  p(N  +  AI). 
Therefore,  R  y  can  be  written  as  follows: 

Rn  -  USASU*  +  <t2U„U;  (8) 

where  Us  =  [ut,  ...,up(w+M)]  denotes  the  signal  eigen¬ 
vectors  and  Un  =  [up(jv+A/)+i , u9/v]  denotes  the  noise 
eigenvectors  and  As  =  diag(Xi, ...,  \p(n+m))  with  A]  > 
A2  >  ...  >  A p(n+m)  >  cr 2  are  the  signal  eigenvalues.  It 
is  shown  that,  range(Us)  =  range(7/v(H))  and  range(U„)  = 
range(7/v(H))x,  i.e.  the  orthogonal  complement  subspace 
to  the  range  space  of  7/v(H).  Using  orthogonality  relation 
between  noise  and  signal  subspace  leads  to: 

U*7at(H)  =  0  (9) 

In  the  original  version  of  subspace  method  [3],  [4],  the  com¬ 
putation  of  the  hole  qN  -  p(N  +  M)  noise  vectors  is  re¬ 
quired  to  estimate  the  channel  parameters  by  minimizing 
||U* T7v (H) ||2  under  a  suitable  constraint. 

IV.  RATIONAL  SUBSPACE  AND  POLYNOMIAL 
BASES 

The  Subspace  identification  method  can  be  recast  in  a 
more  general  framework  by  resorting  to  the  concept  of  ra¬ 
tional  subspaces.  As  we  shall  see  below,  one  can  express  the 
signal  and  noise  subspaces  in  the  field  of  rational  functions 
to  get  more  insights  into  the  subspace  method. 

A  qN  x  1  vector  g  =  [gT(0), . . . ,  g T{N  -  1)]T  (where 
each  vector  g(fc)  is  q  x  1)  belongs  to  the  noise  subspace 
of  R/v  if  and  only  if  g*7)v(H)  =  0.  The  orthogonality 
condition  is  conveniently  rewritten  as: 

g*7}v(H)  =  0  <=>  g*(z)H(z)  =  0  for  each  2 

N  N 

g{z)  =  s(k)z~k  and  g*(z)  =  ^  S*(k)z~k 
k=0  k= 0 

We  denote  by  Cq(z )  the  set  of  all  (/-dimensional  rational 
functions  or,  in  other  words,  the  (/-dimensional  vector  space 
on  the  field  of  all  scalar  rational  functions.  Such  a  vector 
subspace  is  referred  to  as  a  rational  space.  Let  S  be  the 


p-dimensional  rational  subspace  of  C'(z)  spanned  by  the 
column  vectors  of  H(z)  (<S  =  range(H(z)).  Let  BcC.'^z) 
denote  the  orthogonal  complement  of  S  (i.e.,  the  subspace 
of  all  (/-dimensional  rational  transfer  functions  g(z)  satisfy¬ 
ing  g(z)f*(z)  =  0  for  each  f(c)  e  S).  It  then  follows  that 
B  has  dimension  q  -  p. 

The  subspace  method  can  now  be  seen  as  a  method  of 
finding  H(z)  such  that  H(z)  _L  B.  However,  B  can  be 
uniquely  spanned  by  a  basis  of  q  —  p,  (/-dimensional  poly¬ 
nomial  vectors.  Therefore,  to  identify  H(z),  it  suffices  to 
find  a  polynomial  basis  V(z)  =  [vi  (c), ....  v, (2)]  of  B 
and  to  express  the  orthogonality  between  v,  and  H(z)  for 
i  =  1, ...  ,q  -  p.  i.e. 

v‘(2)H(2)  =  0. 

V.  ORTHOGONAL  MINIMUM  NOISE  SUBSPACE 
METHOD 

In  [1],  [2],  an  estimation  method  called  MNS1  has  been 
introduced  to  compute  the  polynomial  basis  of  B.  Each 
polynomial  noise  vector  is  obtained  from  the  least  eigenvec¬ 
tor  of  a  covariance  matrix  computed  from  (properly  chosen) 
(p  +  l)-dimensional  sub-system  outputs. 

In  this  contribution,  an  alternative  method  to  compute  the 
noise  polynomial  basis  V (z)  =  [vj  (2), ...,  v,;_?,(z)]  is  pro¬ 
posed.  The  noise  vectors  are  computed  (i)  recursively  (con¬ 
trary  to  the  MNS  vector  in  [2]  that  can  be  computed  in  a 
parallel  scheme),  (ii)  using  all  system  outputs  (in  [2]  each 
vector  was  computed  using  only  p  +  1  system  outputs),  and 
(iii)  in  such  a  way  to  form  an  orthogonal  basis  of  B  (this  is 
not  the  case  in  [2]),  i.e. 

v*(z)vj(z)  =  0  for  i  ±  j  (10) 

At  the  i-th  step,  we  compute  a  (/-dimensional  polynomial 
noise  vector  v,  (2)  orthogonal  to  H(2)  and  to  the  previously 
computed  (/-dimensional  polynomial  noise  vectors.  Each 
noise  vector  is  obtained  by  computing  the  least  eigenvector 
of  a  qNj  x  qN,,  (i  =  1, ...,  q  -  p)  matrix  which  is  a  function 
of  the  channel  outputs  and  the  previously  computed  poly¬ 
nomial  noise  vectors.  A,  is  chosen  in  order  to  obtain  a  tall 
block  matrix  at  each  step. 

More  precisely,  we  have  the  following  algorithm. 

1.  Initialization: 

•  Choose  N\  a  window  length  such  that  qN\  >  p(M  +  N 1) 
and  estimate  the  covariance  matrix  Rx/  front  the  observa¬ 
tions. 

•  Compute  vi  as  the  least  eigenvector  of  Rn,  ,  the  latter 
satisfies: 

v^Tn,  (H)  =  0  <=>  v?(z)H(2)  =  0  (11) 

1  It  is  minimum  in  the  sense  that  q  —  p  is  the  minimum  number  of  noise 
vectors  needed  to  uniquely  estimate  H(z). 
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vi(*)  =  Efclolvi(*)*  *withVl  =  [vf(0),...,vf(iVi- 

1)]T- 

2.  for  i  =  2, . . . ,  q  —  p: 

•  Choose  TV*  a  window  length  such  that: 

*- 1 

qNi>p(M  +  Ni)  +  Y/(Nj-l)  (12) 

3=1 

and  then  compute  the  matrix: 

i— 1 

M,  =  Riv,.  +  Y,  (v j)^Nt  (v  ■)  (13) 

3=1 

•  Compute  v;  as  the  least  eigenvector  of  M,.  The  latter 
satisfies: 


v*rw,(H)  =  o 

v*Tn,  {Vj)  =  0  for  j  =  1, . . . ,  i  -  1  (14) 

or  equivalently : 


v?  (z)H(z)  =  0 

v*(z)vj(z)  =  0  for  J  =  1,  —  1  (15) 

vi(z)  =  Efio1  Vi(k)z~k  and  Vi  =  [vf(0 vf(Nt  - 
1)]T- 

3.  Once  the  ( q  -  p )  noise  vectors  are  computed,  estimate  the 
channel  matrix  H(z)  (up  to  a  constant  nonsingular  p  x  p 
matrix)  as: 


^  Q-P 

H(z)  =  argminH(.)^||vi*7}vi(H)||2 

i=l 

=  argminH(,)||V‘T/v9_p(H)||2  (16) 

where  Vi  =  [vi,...,v9_„]  with  v,  =  [vf ,Oi,jvrp-.ivi]T 
(0 i  j  being  the  i  x  j  all-zero  matrix).  The  minimization  in 
(16)  is  done  under  a  suitable  constraint  as  shown  in  [7], 


VI.  DISCUSSION 
Computational  complexity: 

The  computational  cost  of  the  MNS  method  is  0((q  — 
p)(p+l)2(N)2)  flops  comparing  to  £>(Ei= f  (^Vi)2)  Hops 
for  OMNS  method  when  it  is  0((qN)3)  for  the  subspace 
method2.  Therefore,  MNS  method  has  the  least  computa¬ 
tional  complexity.  The  above  computation  does  not  take 

2  In  this  computational  costs  we  didn’t  include  the  cost  of  covariance  ma¬ 
trix  estimations,  i.e.  the  estimation  of  E[y(n+k)y*(n)]  k  =  0, . . . ,  M 
which  is  same  for  all  considered  methods.  Also,  it  assumes  that  the  algo¬ 
rithm  implementations  are  optimized  in  the  sense  that  they  take  advantage 
of  the  underlying  Toeplitz  structures  to  reduce  computational  complexity. 


into  account  the  parallel  structure  of  MNS  method,  which 
is  an  additional  advantage  of  this  method.  However,  the 
OMNS  method  remains  less  complex  than  subspace  method 
in  term  of  computational  complexity.  It  is  shown  latter  that 
for  a  large  number  of  sensors  and  for  small  M,  the  value  of 
Ni  for  OMNS  is  small  and  it  remains  constant  for  several 
iterations,  which  results  a  computational  cost  comparable 
or  sometimes  much  less  than  that  of  the  MNS  method.  Ta¬ 
ble  I  provides  some  examples  for  the  values  of  the  window 
lengths  used  in  MNS  and  OMNS  in  function  of  the  the  sys¬ 
tem  parametrs  q,  p  and  M. 

Performance: 

As  mentioned  before,  noise  polynomial  vectors  in  MNS 
method  are  obtained  using  only  p  +  1  outputs  for  each  of 
them  [2],  while  in  OMNS  method  each  noise  vector  is  com¬ 
puted  from  all  the  q  system  outputs.  This  leads  to  an  im¬ 
proved  (a  more  robust)  channel  estimation  especially  when 
the  number  of  system  outputs  is  much  larger  than  the  num¬ 
ber  of  system  inputs.  Futhermore,  the  orthogonality  of  noise 
subspace  might  improve  the  quality  of  the  parameter  es¬ 
timation.  This  has  been  demonstrated  for  othe  subspace 
based  applications,  e.g.  source  localization  [8],  but  perfor¬ 
mance  analysis  needs  to  be  performed  to  asses  this  point  in 
the  context  of  MIMO  system  identification. 

VII.  SIMULATION  RESULTS 

In  this  section,  the  performance  of  the  MNS  method  is 
compared  with  that  of  the  OMNS  method  via  simulation  re¬ 
sults.  We  consider  p  =  2  inputs  where  each  input  sequence 
is  an  i.i.d.,  zero-mean,  unit-variance  QAM4  process.  Both 
MNS  and  OMNS  methods  estimate  the  polynomial  matrix 
H(c)  up  to  a  p  x  p  constant  matrix  Q.  The  output  observa¬ 
tion  noise  is  a  sequence  of  i.i.d.,  zero-mean,  gaussian  vari¬ 
ables  and  the  number  of  samples  is  held  constant  (T  =  500). 
For  each  experiment  Nr  =  100  independent  Monte-Carlo 
runs  are  performed.  The  performance  is  measured  by  the 
mean-square-error  (MSE)  defined  by: 

Nr 

MSE  =  ||HrQr  -  H\\2/Nr]i 

r= 1 

Where  Hr  is  an  estimate  of  H  =  [H(0)...H(M)]T  at  ther- 
th  run,  and  Qr  is  chosen  so  that  ||HrQr  -  H||  is  minimum 
(This  is  to  get  rid  of  the  constant  matrix  indeterminency). 
The  channel  transfer  function  associated  with  the  first  out¬ 
put  corresponds  to  the  same  impulse  reponse: 


l  (^)  —  (^)  —  ••• 

=  A0 g{kTs)  +  Ai g(kTs  -  r,  )  +  ...  +  \Lg{kTs  -  tl) 

L  denotes  the  number  of  paths  and  g(t)  is  generated  from 
the  raised  cosine  spectrum  pulse  with  the  roll-off  factor 
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Channel  parameters 

No  MNS 

Nmks 

Fig  1 

10  x  2  and  M  =  2 

(1,1, 1,1, 1,2,2. 4) 

5 

Fig  2 

6x2  and  M  =  2 

(2.2.4.10) 

5 

Fig  3 

4x2  and  M  =  2 

(3,7) 

5 

Fig  4 

4x2  and  M  =  5 

(6,16) 

11 

TABLE  I 

Channel  parameters  and  length  of  the  processing  window 

USED  IN  EXPERIMENTS. 


equal  to  1/2.  Then  it  is  delayed  and  sampled  at  the  rate 
of  270  kb/sec  (Ts  =  3.7  /its).  The  resulted  channel  impulse 
response  is  windowed  such  that  the  polynomial  degree  is 
M.  Tj  denotes  the  delay  and  A,  the  attenuation.  Attenuation 
is  considered  equal  to  -5  dB  for  all  of  the  paths  and  the  de¬ 
lay  Tj  is  a  multiple  of  path  number,  i.e.  Tj  =  i  x  3.2/tsec 
(for  i  =  1, . . .  ,L). 

The  other  channel  transfer  functions  are  generated  by  as¬ 
suming  a  plane  propagation  model  of  each  path  with  corre¬ 
sponding  electric  angles  uniformely  distributeed  in  [0.  tt/2], 
(i.e.  hj,i(z)  =  J2^=o  ^Lff(kTs  -  Tt)eJWl z~f  with  0,/  € 
[0,  tt/2]). 

Figures  (1)  to  (4)  show  the  comparative  performances  of 
MNS  and  OMNS  for  different  choices  of  channel  parametrs 
q,  p  and  M.  In  the  figures  the  MSE  of  channel  parameter 
estimates  are  plotted  against  the  SNR,  defined  as  the  inverse 
noise  power.  As  expected,  the  performance  gain  of  OMNS 
is  more  significient  when  q  —  p  is  large.  For  q  —  p  small  and 
large  channel  degree  (Fig.  4)  the  performance  of  OMNS  is 
slightly  deteriored  in  comparision  with  that  of  MNS.  This  is 
possibly  due  (but  need  to  be  certified  by  a  theoritical  study) 
to  the  large  window  sizes  that  are  needed  to  compute  the 
OMNS  basis  leading  to  a  large  number  of  parameters  (here 
the  noise  vector  coefficients)  to  be  estimated. 
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ABSTRACT 

This  paper  addresses  the  problem  of  semi-blind  Multi- 
Input  Multi-Output  (MIMO)  equalization  of  time- varying 
channels  by  employing  recursive  filtering  methods.  Based 
on  a  realistic  channel  model  described  in  COST  207 
project,  we  derive  a  state-space  model  that  character¬ 
izes  the  behavior  of  the  channel  in  time.  The  channel 
estimation  and  tracking  are  performed  using  a  Kalman 
filter  method,  and  a  decision  feedback  equalizer  derived 
using  MMSE  criterion  is  used  to  perform  the  equaliza¬ 
tion. 

1.  INTRODUCTION 

MIMO  channels  with  Intersymbol  Interference  (ISI)  and 
Inter-User  Interference  (IUI)  arise  in  many  applications 
including  wireless  communications.  In  addition,  the 
time-varying  nature  of  the  wireless  channels  makes  the 
equalization  even  more  difficult  to  achieve.  Deriving  a 
model  that  describes  the  system’s  time  evolution  can 
be  a  very  difficult  task  taking  into  account  that  the 
time-varying  parameters  are  not  directly  observed  and 
the  model  has  to  be  realistic. 

In  this  paper  we  derive  a  semi-blind  MIMO  algo¬ 
rithm  capable  of  identifying,  tracking  and  equalizing  a 
Time- Varying  Channel  (TVC).  Semi-blind  algorithms 
need  some  training  data  for  channel  acquisition  and 
then  they  run  blindly.  The  advantages  of  the  estima¬ 
tion  and  tracking  stages  can  be  summarized  as  follows: 
the  estimator  is  akin  to  the  usual  Kalman  filter,  it 
is  thus  an  exact  solution  to  the  estimation  problem. 
Combining  this  structure  with  a  multichannel  Decision 
Feedback  Equalizer  (DFE)  we  get  a  true  real-time  al¬ 
gorithm  in  the  sense  that  it  is  recursive  in  time  and 
the  storage  space  needed  to  evaluate  the  estimates  re¬ 
mains  constant,  as  time  progresses  and  the  amount  of 
received  data  increases.  This  means  that  it  is  also  feasi¬ 
ble  to  cope  with  a  large  number  of  parameters.  Another 

’This  work  was  supported  by  Nokia  and  Academy  of 
Finland. 


method  combining  DFE  and  Kalman  was  proposed  in 
[2].  In  our  paper  the  measurement  and  process  noise 
variances  used  in  Kalman  filter  are  estimated  using  the 
received  data  [1]  and  a  new  MIMO  structure  for  DFE 
is  derived  based  on  MMSE  criterion.  Simulations  are 
carried  out  using  realistic  channels. 

This  paper  is  organized  as  follows.  The  system 
model  is  presented  first.  Then  a  description  of  the 
proposed  algorithm  is  given.  In  section  4,  simulation 
results  of  equalization  for  realistic  MIMO  channels  are 
presented. 

2.  SYSTEM  MODEL 

Let  consider  a  MIMO  system  with  m  source  signals  and 
n  sensors  at  the  receiver.  The  received  observations 
from  sensor  j  (with  j= 1,. . .  ,n)  at  time  t  are  given  by: 

771  Lij—1 

yj(t)  =  ^2  -  0  +  vj(t)  (i) 

i=l  1=0 

where  Xi(t  —  l)  is  the  symbol  drawn  from  a  constellation 
X  of  the  «-th  user  at  time  t  —  l,  hij(l)  is  the  impulse 
response  of  the  TVC,  yj{t)  is  the  received  signal,  and 
vj(t)  is  the  additive  Gaussian  noise  with  variance  re¬ 
setting  L  =  max  L,j .  the  channel  length,  we  obtain  the 
following  vector  form: 

L 

y{t)  =  '^2'Ui{t)y.(t -l)+v{t)  (2) 

1=0 

where  y  is  a  column  vector  of  n  received  signals,  x  is 
a  column  vector  of  m  transmitted  signals,  Hi{t)  is  a 
n  x  m  matrix  containing  the  channel  taps  and  v  is  an 
additive  noise  vector.  For  simplicity,  a  2  x  2  MIMO 
model  is  presented  in  Figure  1. 

Let  assume  that  the  j-th  received  signal  is  a  su¬ 
perposition  of  Np  paths.  The  resulting  channel  im¬ 
pulse  response  can  then  be  described  using  the  Gaus¬ 
sian  distributed  Wide-Sense  Stationary  with  JJncorre- 
lated  Scattering  (WSSUS)  model  [6]: 
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Figure  1:  The  (2,2)  MIMO  system  model 


M*,r)  =  -^=Y,e^^t+B^hRF(T  -  tp)  (3) 

VNP  p=  1 

where  fd,P  is  the  Doppler  spread,  9P  is  the  angular 
spread,  rv  is  the  delay  spread  of  the  path  p  and  hnr(t) 
is  the  impulse  response  of  the  receive  filter. 

Four  propagation  environments  are  widely  used:  Typ¬ 
ical  Urban  (TU),  Bad  Urban  (BU),  Hilly  Terrain  (HT) 
and  Rural  Area  (RA),  each  of  them  having  specific  pa¬ 
rameter  values.  This  model  is  suitable  for  many  chan¬ 
nels  of  practical  interest  in  mobile  wireless  communi¬ 
cations,  which  is  our  concern  in  this  paper. 

Considering  that  the  m-dimensional  transmitted  se¬ 
quence  is  a  white  sequence  drawn  from  a  PSK  constel¬ 
lation  and  if  the  n-dimensional  received  signal  is  sam¬ 
pled  at  symbol  rate  we  get  the  following  discrete-time 
model: 


y(fe)  =  X(k)h(k)  +  v(Ar)  (4) 

where  Ar  is  a  n  x  nrn L  data  matrix  defined  as: 


A  (A:)  —  [ X\  {k)Im  -  .  -  Xn  (k)Im  •  •  - 

xi  (k  —  L  +  1  )Im  . . . xn(k  —  L  +  1)/,,,]  (5) 

Im  is  an  m  x  m  identity  matrix  and 

h(fc)  =  [ft?i  (*)...&?„(*)...  Cl  (fc)---Cn  (*)■■■  (6) 

h.^1  (k)  ■  ■  ■  htT'ik) . . .  CT1  (k) . . .  hL„,-n(k)} T 

is  a  vector  of  length  nmL  containing  channel  coeffi¬ 
cients. 

3.  RECURSIVE  ALGORITHM 

The  proposed  algorithm  consists  of  two  stages:  first  we 
deal  with  channel  aquisition  using  training  data  and 


in  the  second  stage  we  perform  channel  tracking  and 
equalization.  This  type  of  structure  allows  for  real-time 
implementation. 


3.1.  Channel  aquisition 

In  this  section  we  are  interested  in  estimating  the  chan¬ 
nel  coefficients  hjj(l,  k)  using  limited  training  data.  Our 
algorithm  is  based  on  the  well  known  Kalman  filter  [5]. 
In  matrix  notation  we  have  the  following  state  space 
equations: 


y(k)  =  X(k)h(k)  +  y(k)  (7) 

h(k)  =  Ah(k  -  1)  +  w(fc)  (8) 


where  A”(A-)  contains  transmitted  symbols,  h (k)  are  the 
channel  taps  at  time  instant  k  and  .4  is  the  state  tran¬ 
sition  matrix,  in  our  case  an  identity  matrix.  Noises 
v  and  w  are  mutually  uncorrelated,  white  noise  se¬ 
quences  with  covariance  matrices  R  and  Q.  These  co- 
variance  matrices  may  be  estimated  prior  to  perfoming 
the  equalization  based  on  the  whiteness  property  of  the 
innovation  sequence  in  optimum  Kalman  filtering  [1]. 

During  the  training  period  the  transmitted  sym¬ 
bols  are  known  to  the  receiver.  Let  us  denote  them  by 
A 'training  according  to  (5).  The  Kalman  filter  equations 
can  be  summarized  as  follows: 


=  Ah(k-l\k-l)  (9) 

=  AP(k  -  l\k  -  1)AT  +  Q 

P{k\ k  1)A trniriing 
Attaining^  P{k\k  —  1)A training  4"  R 

=  h(k\k  -  1)  +  K{k)(y(k)  - 

^training  H( /c  |  At  1)) 

=  P(k\k  -  1)  -  K(k)XtrainingT P{k  -  1 1 A:  -  1) 

filtered  estimates  of  channel  taps  at  time 
instant  k  are  given  by  h(fc|A-). 


h(A-|A-  - 

1) 

P(k\k  - 

1) 

A'l 

Ik) 

h(A-|A-) 

P(k 

1*0 

Thus, 

the 

3.2.  Equalization 

In  the  previous  section,  we  described  how  to  estimate 
the  channel  taps.  The  remaining  task  is  to  perform 
equalization  in  order  to  get  estimates  of  the  desired 
symbols.  A  MIMO  DFE  based  on  MMSE  criterion 
is  derived.  Let  us  start,  by  defining  the  channel  con¬ 
volution  matrices  Hjj  of  dimension  Nci,  x  Nf,  where 
Nci,  =  L  +  Nf  —  1  and  Nf  is  the  feedforward  filter 
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length. 


'  hij(o-,k)  0  \ 

hij(2-k)  0 

Hij(k)  =  :  (10) 

hij(Lh-l;k)  '•  *y(l;fc) 

0  hij(  2;k) 

V  0  hi  j(Lh-l-,k)  ) 

where  hij  are  obtained  from  the  Kalman  filter  part. 

Applying  the  feedforward  filter  to  the  past  Nf  re¬ 
ceived  observations  and  the  feedback  filter  to  the  past 
Nd  estimated  symbols  for  each  output  we  get  the  soft 
estimate: 

Nf  Nd 

Zi(k)  =  Y2(iqyi(k  -  q)  ~Y^diqXi{k  -  q)  (11) 
7=1  7=1 

Equalization  is  achieved  via  feedforward 
f i  =  (fa  >■■■,  fiNf  )T ,  and  feedback  d;  =  (dn, . . . ,  diNd)T 
filters.  These  filters  are  obtained  by  minimizing  the 
MSE  cost  function  with  respect  to  f)  and  dp 

Jr  =  E{(Xi(k  -  S)  -  Zi(k))2}  (12) 

where  6  is  the  equalization  delay. 

For  illustration  purposes,  let  us  consider  the  sim¬ 
plest  case  of  a  2  x  2  MIMO  system.  The  soft  estimates 
at  receivers  1  and  2  are: 

Zi  =  xftfufi  Tx^^ifi  +vjrf1  -xfdi  (13) 
h  =  Xj  Hi2i2  +  x7  H22f2  +  vif  f2  -  -k2  d2 

where  x,  =  [x,(fc), . . . ,  xjk  -  Nch  -  l)]7  is  a  vector 
of  transmitted  symbols  from  user  i,  i  =  {1,2},  x7  = 
[£j(k  —  1 Xj(k  —  Nd)]T  is  a  vector  of  estimated 
symbols  from  receiver  j,  j  =  {1,2},  vj  =  [vj(k  — 
1), . . .  ,Vj(k  —  Nf)]T  is  the  noise  vector  at  the  receiver 
j.  The  past  decisions  of  the  equalizer  are  assumed  to 
be  correct. 

The  gradient  of  J\  with  respect  to  di  is: 

VdlJi  =  2E{xiX!(A:  -  <5)  -  xixfETnfi  -  (14) 
Xixj#2lfl  -Xivffl  +Xixfdi} 

Assuming  that  the  input  sequences  are  uncorrelated 
with  each  other  and  with  the  noise,  the  above  expres¬ 
sion  simplifies  to: 

(15) 


where  M  is  an  Nd  x  Nch  matrix  having  the  structure 
M  =  (0NdxS  lNdxNd  ONdxNch—Nd—d')?  I  is  <m  identity 
matrix,  E  {xyxf  }  =  a2  M  and  <r2  is  the  variance  of  the 
input  signal.  The  MMSE  feedback  filter  can  be  written 
as: 

dy  =  MHn{i  (16) 

In  a  similar  way  we  find  d2  =  MH22i2.  Note  that 
during  the  theoretical  derivation,  the  notation  Hij  was 
used.  However,  in  the  receiver  we  do  not  have  knowl¬ 
edge  of  the  real  channel,  thus  an  estimate  Hij  is  used 
instead. 

Similarly,  for  the  feedforward  parameters  we  have  : 

V  fl  J\  =  2{a2H1nH11{i~(T2H^MTd1+  (17) 

a2H21H2ify  +  cr?fi  -  cr^Hyyes) 

where  eg  =  (0, . . . ,  0, 1, 0, . . . ,  0)T  is  the  standard  ba¬ 
sis  vector,  with  one  at  the  position  6,  0  <  S  <  Nf. 
Substituting  di  =  MHyJi  and  denoting  with  Pdfb  — 
(I  —  M1' M ),  the  MMSE  feedforward  filter  is  given  by: 

f!  =  (H^PdfeHu  +  HlH-21  +  \I)~1Hiles  (18) 

For  the  second  receiver  we  have:  f2  =  ( H22PdfeH22  H- 
Hy2Hi2  +  XI)-1  H22es.  A  comprehensive  derivation  for 
a  m  x  n  MIMO  case  is  given  in  [1]. 

Finally,  the  symbol  estimate  at  time  k  is  obtained 
by: 

x-i{k)  =  argmin  |a  —  Zi(k)\  (19) 

where  A  is  a  finite  alphabet. 

3.3.  Practical  implementation  of  the  algorithm 

The  algorithm  operates  in  two  modes: 

Training  mode.  In  the  training  mode  only  the  Kalman 
filter  is  running. 

Step  1.  Obtain  the  observations  y(k)  and  generate  the 
local  training  data  sequence  X  (k) training ,  k  = 
0,  •  •  • ,  Ntrain  ~  1,  where  N train  is  the  length  of 
the  training  sequence. 

Step  2.  Estimate  the  channel  coefficients  h(&)  and  noise 
statistics  [1]  by  running  the  Kalman  algorithm 
described  by  the  set  of  Equations  (9). 

Blind  mode.  In  the  blind  mode  both  DFE  and  Kalman 
algorithms  are  running  in  an  alternating  manner.  We 
assume  that  the  h(k)  has  been  estimated  during  the 
training  period  and  we  use  estimated  symbols  instead, 

Xtraining  ~  X . 

Step  1.  Run  DFE  algorithm  and  estimate  transmitted  se¬ 
quence  X(k). 


Vdl  J7i  —  — 2(j~MHi\fi  +  2cr~/di 


Step  2.  Having  X(k)  run  Kalman  algorithm  and  obtain 
h(fc).  The  channel  estimate  h (k)  is  used  at  next 
step  k  +  1  by  the  DFE. 

4.  SIMULATIONS 

In  the  simulations  we  use  a  linearized  OMSK  signal 
[4],  The  pulse  shape  of  this  modulation  is  used  as  the 
receive  filter  impulse  response.  A  training  sequence 
of  50  symbols  is  used  for  the  algorithm  initialization. 
After  the  training  stage,  the  algorithm  keeps  the  track 
without  additional  training  data.  For  each  simulation 
we  consider  100  Monte  Carlo  realizations. 

The  downlink  connection  in  a  cellular  communica¬ 
tion  system  with  the  carrier  frequency  of  900  MHz  is 
considered.  The  simulations  are  done  for  ’Hilly  Terrain’ 
propagation  environment  with  the  receiver  speed  of  100 
km/h  (HT  100).  The  corresponding  maximum  Doppler 
shift  is  83.3  Hz.  The  estimated  channel  magnitudes  for 
HT100  direct  channels  are  presented  in  Figures  2  and 
Figure  3,  respectively.  Note  that  Kalman  algorithm 
needs  only  few  observations  in  order  to  find  the  true 
channel  coefficients. 

The  symbol  error  rate  (SER)  for  each  user  at  differ¬ 
ent  SNR  is  shown  in  Figure  4.  We  note  that  the  quality 
of  reception  is  different  for  the  two  users.  This  is  due 
to  the  fact  that  we  have  different  IUI  powers  for  the 
two  users. 


Figure  2:  The  estimated  channel  magnitude  (dashed 
line)  and  the  true  channel  magnitude  (solid  line)  for 
the  main  path  hn,  HT  at  100  km/h,  SNR=15d£h 


5.  CONCLUSIONS 

A  real-time  MIMO  TVC  equalization  technique  is  de¬ 
rived.  The  channel  tracking  is  performed  using  a  Kalman 
filter  and  the  transmitted  symbol  estimation  is  done 
by  a  novel  MIMO  DFE  structure.  The  channel  is  a 
fast  TVC  with  the  coherence  time  equal  to  the  symbol 
period.  The  channel  model  fits  very  well  to  the  wire¬ 
less  communication  problem,  when  the  signal  arrives 


Figure  3:  The  estimated  channel  phase  (dashed  line) 
and  the  true  channel  phase  (solid  line)  for  the  main 
path  h22,  HT  at  100  km/h,  SNR  =  15rm. 


Figure  4:  SER  vs.  SNR 


to  the  receiver  from  different  paths  with  different  de¬ 
lay  spread,  angular  spread  and  Doppler  spread.  The 
simulation  results  show  that  the  algorithm  achieves  a 
good  performance  even  in  very  demanding  channel  con¬ 
ditions.  The  main  improvement  is  that  it  needs  only  a 
single  training  period  at  the  beginning  of  the  transmis¬ 
sion,  after  that  it  runs  blindly. 
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ABSTRACT 

In  this  paper,  with  a  given  set  of  non-Gaussian  measure¬ 
ments,  a  cumulant  based  single-input  multi-output  (SIMO) 
blind  channel  estimation  (BCE)  algorithm  is  proposed  that 
uses  multi-input  multi-output  (MIMO)  inverse  filter  criteria 
(blind  deconvolution  criteria  using  higher-order  cumulants) 
proposed  by  Tugnait,  and  Chi  and  Chen.  Then  a  time  de¬ 
lay  estimation  (TDE)  algorithm  is  proposed  that  estimates 
P  —  1  time  delays  from  the  phase  information  of  the  esti¬ 
mated  single-input  P-output  (P  >  2)  system  obtained  by 
the  proposed  SIMO  BCE  algorithm.  Some  simulation  re¬ 
sults  are  presented  to  support  the  efficacy  of  the  proposed 
SIMO  BCE  and  TDE  algorithms. 

1.  INTRODUCTION 

Blind  channel  estimation  (BCE)  for  single-input  multi-out¬ 
put  (SIMO)  systems  is  a  problem  of  estimating  a  P  x  1  linear 
time-invariant  (LTI)  system,  denoted  h[n]  =  (+  [rt],  /i2[n], 
...,  hp[n])T ,  with  only  a  set  of  non-Gaussian  vector  output 
measurements  x[n]  =  {x\[n],X2[n\,  ...,xp[n])T  as  follows 

OO 

x[n]  =  h[fc]n[n  —  A;]  +  w[n]  (1) 

k=— oo 

where  u[n]  is  the  non-Gaussian  driving  input  signal  and 
w[n]  =  (wi[n],W2[n],  ...,wp[n])T  is  additive  noise.  The 
SIMO  LTI  system  arises  in  science  and  engineering  areas 
where  multiple  sensors  are  needed  such  as  time  delay  esti¬ 
mation  [1]  and  seismic  signal  processing,  etc.  In  communi¬ 
cations,  multiple  antennas  receiving  signals  and  fractionally- 
spaced  signal  processing  at  receiver  can  also  be  modeled  as 
SIMO  LTI  systems  [2]. 

This  work  was  supported  by  the  National  Science  Council 
under  Grant  NSC  89-2213-E007-132. 


2.  BCE  FOR  SIMO  LTI  SYSTEMS 

Let  cum{j/i,  j/2,  yP}  denote  the  pth-order  joint  cumulant 
[3]  of  random  variables  yi,y-2, ...,  yp  and 

Cp,q{y}  =  cum{t/i  =y2  =  ---=yp  =  y,  yp+1  =  yp+2  = 

=  yp+q  =  y*}  (2) 

where  y"  is  the  complex  conjugate  of  y.  Let  P{»}  and 
T~ 1  {•}  denote  the  discrete-time  Fourier  transform  and  in¬ 
verse  Fourier  transform  operators,  respectively.  Assume 
that  we  are  given  a  set  of  non-Gaussian  measurements  x[n], 
n  =  0,  1,  ...,  N  —  1,  modeled  by  (1)  with  the  following  as¬ 
sumptions: 

(Ml)  u\n]  is  zero-mean,  independent  identically  distributed 
(i.i.d.),  non-Gaussian  and  Cp,q{u[n]}  /  0  for  a  chosen 
(p,  q),  where  p  and  q  are  nonnegative  integers  and 
p  +  q  >  3. 

(M2)  The  SIMO  system  h[n]  is  stable. 

(M3)  The  noise  w[n]  is  zero-mean  Gaussian  (which  can  be 
spatially  correlated  and  temporally  colored)  and  sta¬ 
tistically  independent  of  w[n]. 

Let  v[n]  =  (i/i[n],  v2[n], vp[n])T  be  a  P  x  1  FIR  in¬ 
verse  filter  (deconvolution  filter)  for  which  v[n]  =  0  for 
n  <  L i  and  n  >  L2,  and  let  e[n]  be  the  inverse  filter  out¬ 
put,  i.e., 

L<2 

eM  =  5ZvTW'x[n_fl 

l=L  l 

OO  L-2, 

=  s[k]  ■  u[n  —  fc]  +  ^2,  vT[/]w[n  —  l]  (3) 

fc=—  oo  l  —  L\ 

where  s[n]  is  the  overall  system  given  by 

•SM  =  vTMhE  -  (4) 

l=L1 
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Chi  and  Chen  [4]  design  the  inverse  filter  v[?f]  by  maxi¬ 
mizing  the  following  multi-input  multi-output  inverse  filter 
criteria  (MIMO-IFC) 


Jp,q(v[n]) 


|C,,,{e[rc]}|<P+-d/2 


(5) 


where  p  and  q  are  nonnegative  integers  and  p+q  >  3.  They 
also  proposed  a  fast  iterative  MIMO-IFC  based  algorithm 
[5]  for  obtaining  the  optimum  inverse  filter  v[n]  for  p+q  >  3 
as  x[n]  is  real  and  p  =  q  >  2  as  x[n]  is  complex.  Based 
on  the  relation  between  the  optimum  v[n]  and  the  MIMO 
linear  minimum  mean  square  error  equalizer  reported  in  [5], 
one  can  show  the  following  fact  on  which  the  BCE  algorithm 
for  SIMO  systems  below  is  based: 

Fact  1.  Assume  that  V(ut)  =  7'{v[n]}  is  the  optimum 
inverse  filter  associated  with  ./p,p(v[7i])  with  L\  — >  —  oo  and 
L2  — >  00.  Let 

gp[n]  =  «p[n](5*[n])p-1  (6) 

GP(w)  =  T{gM)  (7) 

lZ(u>)  =  ^{R[fc]}  =  J{£[x[i)]x"[n  -  A-]]}.  (8) 

Then 

H»  =  (^{h[n]})‘  =  O) 


where  a  is  a  non-zero  constant. 


SIMO  BCE  Algorithm: 

Step  1.  Blind  Deconvolution. 

With  finite  data  x[n],  obtain  the  inverse  filter  v[n]  asso¬ 
ciated  with  Jp,,,(v[n])  using  Chi  and  Chen’s  fast  MIMO- 
IFC  algorithm  [5],  and  its  £-point  FFT  V(wj,),  where 
Uk  =  2ir k/C,  k  =  0,1,...,  C  -  1.  Obtain  U{uJk)  using 
multichannel  Levinson  recursion  algorithm  [6]. 


Step  2.  Channel  Estimation. 

(51)  Set  i  —  0.  Set  initial  values  H(0,(u>t.)  and  conver¬ 
gence  tolerance  e/,  >  0. 

(52)  Update  i  by  i  +  1.  Compute 

S(<-]  V)  =  V(uk)  (10) 

by  (4)  and  its  £-point  inverse  FFT  s(!_1)[n]. 

(53)  Compute  gp[n ]  using  (6)  with  sfr?]  =  s(l-i)[n]  and 
its  £-point  FFT  Gp(ujk  )- 

(54)  Compute 

(n) 

by  (9)  which  is  then  normalized  by 

Zto  ll^(i,(^.)ll2  =  I- 


(S5)  If 

Y,  II  iJ<i,(u>t.)-H(,-1V-)  H2>f/- 

A=0 

then  go  to  (S2),  otherwise  H( ujk)  =  (ex¬ 

cept  for  a  scale  factor)  and  its  £-point  inverse  FFT 
h[n]  are  obtained. 

Two  worthy  remarks  regarding  the  proposed  SIMO  BCE 
algorithm  are  as  follows. 

(Rl)  The  region  of  support  associated  with  the  estimate 
h[ri]  can  be  arbitrary  as  long  as  the  FFT  size  C  is 
chosen  sufficiently  large  so  that  aliasing  effects  on 
the  resultant  h[«]  are  negligible. 

(R2)  The  obtained  estimate  H(uj)  is  robust  against  Gaus¬ 
sian  noise  because  (9)  is  true  regardless  of  the  value 
of  signal-to-noise  ratio  (SNR),  although  the  inverse 
filter  v[n]  and  the  power  spectrum  1Z(uj)  depend  on 
SNR. 

3.  TIME  DELAY  ESTIMATION  (TDE) 

In  time  delay  estimation,  a  single  source  signal,  denoted 
S[?i],  is  received  by  P  (>  2)  spatially  separate  sensors.  The 
received  signal  vector  x[n]  can  be  modeled  as 

x[n]  =  s(ri]  +  w[?i] 

=  (sf?)],  oisjn  —  <fi], ...,  ap_iS|7)  —  dp_i])T  +  w [77]  (12) 

where  a,  and  d,,  i  =  1,2, ...,  P  -  1  are  amplitudes  and  time 
delays,  respectively,  s[n]  is  a  wide-sense  stationary,  colored 
non-Gaussian  signal  modeled  by 

OO 

S[n]  =  ^  h[k]u[n  -  A:]  (13) 

k  =  —  00 

in  which  h[n]  is  a  stable  LTI  system  and  u[n]  is  zero-mean, 
i.i.d.  non-Gaussian,  and  w[n]  is  a  P  x  1  additive  Gaussian 
noise  vector  which  can  be  spatially  correlated  and  tempo¬ 
rally  colored. 

From  (12)  and  (13),  one  can  easily  see  that  x[«]  can  also 
be  expressed  as  an  SIMO  model  as  follows 

OO 

x[n]  =  ^  h[fc]?/[n  -  k]  +  w[ti]  (14) 

A’=  — OO 

where 


h[n]  =  ( 

7*[n],  ai/t[n  —  di], ...,  ap-ih[n  —  dp-\])T . 

(15) 

H(uo) 

=  ^{hN} 

(16) 

4>{u)) 

=  {(j>1(uj),...,<l>p{uj))T  =  arg{IT(ai)} 

(17) 

B(u) 

—  (1  ej(0e(^)-'/,i(w))' 

)7  (18) 
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It  can  be  easily  shown  that 

b[n]  =  (6i[n],62H,  -,hp[n])T  = 

=  (<5[n],<S[ri  -  di],  ...,5[n  -  dP-1])T.  (19) 

TDE  Algorithm: 

(Tl)  Process  x[n]  using  the  proposed  SIMO  BCE  algo¬ 
rithm  to  estimate  H(tok),  k  =  0,1,  £  —  1,  and 

then  obtain  its  phase  4>(uik)- 

(T2)  Obtain  B(uk)  using  (18)  and  its  inverse  £-point  FFT 
b[n].  Then  the  estimate  di  is  obtained  as 

di  =  arg{max{|6i+i[n]|}},  *  =  1, 2, P  -  1  (20) 

n 

by  (19). 

4.  SIMULATION  RESULTS 

A.  Simulation  Results  for  the  Proposed  SIMO  BCE 
Algorithm 

Consider  a  2-channel  MA(6)  system  taken  from  [7]  whose 
transfer  function  was 

0.6140  +  0.3684a-1 

fj(%\  — 

V  ;  -0.2579a-1  -  0.6140a-2  +  0.8842a-3 

+0.4421a-4  +  0.2579a-6 

(21) 

The  driving  input  u[n)  was  a  real  zero-mean,  exponentially 
distributed  i.i.d.  random  sequence  with  unit  variance.  The 
noise  vector  w[n]  =  (w\[n],W2[n])T  was  assumed  to  be  spa¬ 
tially  independent  and  temporally  white  Gaussian.  The 
synthetic  data  x[n]  were  processed  by  the  proposed  SIMO 
BCE  algorithm  with  p  —  2,  FFT  length  £  =  64,  L\  =  0 
and  £2=7  for  the  inverse  filter  v[n]  and  the  initial  con¬ 
dition  H^°\uik)  —  1  for  all  k.  Thirty  independent  realiza¬ 
tions  were  performed  for  N  =  1024,  2048  and  4096,  and 
SNR  =  10  dB,  5  dB,  0  dB  and  —5  dB,  respectively,  where 
SNR  is  defined  as 

p 

^E[|xi[n]  -  w4»]|2] 

SNR  =  - .  (22) 

J2E[\wi[n]  |2] 


delay  between  h^[n]  and  the  true  h[n]  was  artificially  re¬ 
moved.  The  normalized  mean-square  error  (NMSE)  for  the 
ith  channel  estimate  hi[n]  is  defined  as 


NMSE, 


30 

i  £ 

30  ' 


.n— 0 


(24) 


Then  the  overall  NMSE  (ONMSE)  [7]  can  be  obtained  by 
averaging  NMSE,  over  P  channels  as  follows: 


ONMSE  =  i^NMSEi. 

i= 1 


(25) 


Table  1  show's  the  ONMSEs  for  different  values  of  data 
length  N  and  SNR  associated  with  the  proposed  SIMO 
BCE  algorithm  and  Tugnait’s  method,  respectively.  One 
can  see  from  Table  1  that  the  proposed  SIMO  BCE  algo¬ 
rithm  performs  much  better  than  Tugnait’s  method  (smaller 
ONMSE). 

B.  Simulation  Results  for  the  Proposed  TDE  Algo¬ 
rithm 

Assume  that  there  were  2  sensor  elements  ( P  =  2), 
the  amplitude  ai  =1,  the  true  time  delay  d\  =  5  and 
the  driving  input  u[n]  was  a  real  zero-mean,  exponentially 
distributed  i.i.d.  random  sequence  with  unit  variance.  The 
system  h[n]  (see  (13))  was  a  non-minimum  phase  ARMA(3,2) 
system  taken  from  [1] 


H{z) 


1  -  2.95z-1  +  1.9z-2 
1  -1.3a-1  +  1.052-2  -  0.32z~3 


(26) 


and  noise  w[n]  was  coherent  (i.e.,  ufi[n]  =  £02 [ra])  and  w\  [n] 
was  generated  as  the  output  of  a  first-order  MA  model  [1] 


Hw(z)  -  1  +  0.8z-1  (27) 


driven  by  white  Gaussian  noise.  The  synthetic  data  x[n] 
were  processed  by  the  proposed  TDE  algorithm  with  p  =  2, 
FFT  length  £  =  32,  L\  =  0  and  L2  =  9  for  the  inverse 
filter  v[n]  and  the  initial  condition  H(ti)(ujk)  =  1  for  all  k. 
Thirty  independent  runs  were  performed  for  N  =  2048  and 
4096,  and  SNR=  0  dB  and  —5  dB.  For  comparison,  d\  is 
also  estimated  by  Tugnait’s  TDE  methods  [1]  as  follows: 


For  comparison,  h[n]  is  also  estimated  by  Tugnait’s  BCE 
method  [7]  as  follows: 


_  E  [xi[k\e*[k  —  n]] 


(23) 


where  e[n ]  is  the  optimum  inverse  filter  output  associated 
With  J2,2. 

Let  h>l>[n]  denote  the  estimate  of  h[n]  at  the  Zth  re¬ 
alization  normalized  by  a  constant  energy,  and  the  time 


di  =  arg{max{Ti[d]}},  i  —  1  or  2 

d 


where 


T_  W1  _  |cum{a;i  [n  -  d], £i[n  -  d],  a:2[w],  a:2[w]}| 
VI<?4,o{£i[n]}|  •  |C4,o{x2[n]}| 


(28) 


(29) 


To  Ml  =  lC4,o{a:i[w-d]  +  g2[n]}l 

16>/|C4,o{®iN}|-|C4,o{*2[n]}r 
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Table  1.  ONMSE  associated  with  the  proposed  SIMO  BCE  algorithm  and  Tugnait’s  method,  respectively. 


Proposed  algorithm 

Tugnait’s 

method 

SNR  (dB) 

N 

10 

5 

0 

-5 

10 

5 

0 

-5 

1024 

0.0358 

0.0460 

0.1568 

0.7055 

0.0438 

0.0606 

0.1846 

0.8888 

2048 

0.0183 

0.0228 

0.0650 

0.5481 

0.0210 

0.0291 

0.0767 

0.7979 

4096 

0.0109 

0.0139 

0.0354 

0.2812 

0.0110 

0.0163 

0.0400 

0.4647 

Table  2.  Mean,  standard  deviation  and  RAISE  for  d,  associated  with  Tugnait’s  methods  and  the  proposed  TDE  algorithm, 
respectively. 


True  Time  Delay  d 

i  =5 

SNR  (dB) 

TDE  Method 

N  =  2048 

1 V  =  4096 

Mean 

a 

RMSE 

Mean 

a 

RMSE 

T\[d] 

4.8333 

1.5992 

1.5811 

4.8667 

0.9732 

0.9661 

0 

T2[d] 

5.0333 

1.6078 

1.5811 

5.0000 

0.0000 

0.0000 

Proposed  Algorithm 

5.0000 

0.0000 

0.0000 

5.0000 

0.0000 

0.0000 

T,[d] 

6.5000 

6.0272 

6.1128 

4.8667 

5.5238 

4.4497 

-5 

T2[d) 

4.9667 

5.8101 

5.7126 

3.3667 

4.2221 

4.4609 

Proposed  Algorithm 

4.1667 

1.8952 

2.0412 

4.6667 

1.2685 

1.2910 

Table  2  shows  mean,  standard  deviation  (a)  and  root- 
mean-square  error  (RMSE)  for  d i  associated  with  Tugnait’s 
methods  and  the  proposed  TDE  algorithm,  respectively. 
One  can  see  from  Table  2  that  the  proposed  TDE  algorithm 
performs  much  better  than  Tugnait’s  methods  (smaller  vari¬ 
ance  and  RMSE). 

5.  CONCLUSIONS 

We  have  presented  an  SIMO  BCE  algorithm  using  cumulant 
based  MIMO-IFC  (see  (5))  which  is  robust  against  Gaus¬ 
sian  noise,  and  a  TDE  algorithm  that  estimates  P  -  1  time 
delays  only  using  the  phase  information  of  the  single-input 
P-output  (j P  >  2)  system  estimated  by  the  proposed  SIMO 
BCE  algorithm.  Simulation  results  show  that  the  proposed 
SIMO  BCE  algorithm  and  TDE  algorithm  outperform  Tug¬ 
nait’s  channel  estimation  method  and  TDE  methods,  re¬ 
spectively. 
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Abstract 

This  paper  concerns  the  problem  of  estimating  the  array 
element  relative  gain  and  phase  responses  using  sources 
of  opportunity  in  an  uncertain  multipath  environment. 
Unlike  previous  methods,  which  assume  uncorrelated 
source  wavefronts,  we  propose  to  perform  calibration  in  a 
correlated  multipath  environment.  We  present  two 
algorithms  that  apply  traditional  blind,  single  input 
multiple  output  (SIMO)  methods  to  the  array  calibration 
problem.  Calibration  is  performed  by  discriminating  the 
received  components  which  remain  the  same  for  all 
sources  and  are  thus  due  to  the  receiver  gains  and  phases. 
Simulation  results  demonstrate  the  effectiveness  of  both 
techniques  in  terms  of  reduced  sidelobe  levels. 

1.  Introduction 

Performance  of  manyarray  processing  algorithms  (cf. 
e  g-  [1],  [2])  are  seriously  limited  by  the  knowledge  of  the 
array  response.  The  objective  of  array  calibration  can  be 
defined  as  the  accurate  characterization  or  estimation  of 
the  array  manifold.  Previous  approaches  to  array 
calibration  include  maximum  likelihood  techniques  [3], [4] 
and  eigenstructure  methods  [5], [6], [7].  These  methods 
either  assume  one  or  more  uncorrelated  source  wavefronts 
are  available  which  can  be  used  to  fit  the  sensor 
calibration  factors,  or  assume  specific  array  geometries.  In 
complex  correlated  multipath,  however,  it  is  difficult  to 
accurately  model  the  source  wavefronts  and  this  has  led  to 
development  of  so  called  blind  beamforming  techniques 
which  exploit  alternative  signal  properties  to  estimate  the 
array  calibration  factors.  In  the  techniques  proposed  here, 
array  calibration  is  performed  by  discriminating  the 
received  components  which  remain  the  same  for  all 
sources  and  are  thus  due  to  the  receiver  gains  and  phases. 
Thus  to  apply  SIMO  methods,  for  example,  the  “single 
corresponds  to  the  spatial  frequency  domain 
element  gain  and  phase  response,  while  the  “multiple 
outputs”  correspond  to  the  wavenumber  spectra  of  the 
different  multipath  sources  of  opportunity.  The  observed 
data  is  assumed  to  consist  of  angularly  separated  sources 
of  opportunity  at  the  same  frequency  which  are  measured 
at  different  times  at  a  fixed  sensor  array  whose  element 
locations  are  known  a  priori.  A  technique  has  been 


proposed  by  Leshem  et  al  [9],  which  also  identifies  the 
isomorphism  between  the  SIMO  channel  identification 
problem  and  array  calibration.  However,  their  method 
requires  that  data  be  collected  along  a  fine  grid  in 
azimuth,  which  in  practice  may  not  be  possible. 

In  this  work,  we  present  two  computationally 
efficient  non-iterative  techniques  using  only  the 
approximate  knowledge  of  the  source  location  and  the 
range  of  angles  that  the  multipath  wavefronts  from  a  given 
source  can  take.  The  first  technique  proposed  is  a  least 
squares  technique  using  a  modified  form  of  the  well 
known  cross  relation  (CR)  technique  used  in  blind 
identification  of  multipath  wireless  communication 
channels.  In  the  second  technique,  known  as  the  “Direct 
Method”,  a  solution  to  the  reciprocal  calibration  factors  is 
obtained  in  a  single  step  without  recourse  to  least  squares. 

Simulation  results  from  a  44  element  ULA 
indicate  that  good  performance  is  achieved  even  at  low 
SNR  levels  in  terms  of  reduced  side-lobe  levels  and  array 
gain  degradation. 

2.  SIGNAL  MODEL 

Consider  an  array  of  N  sensors  located  in  a  multipath 
environment  consisting  of  L  angularly  separated  sources. 
From  each  source,  there  exists  several  paths  to  the  array. 
It  is  assumed  that  the  multipaths  due  to  a  given  source  are 
correlated.  Since  the  source  is  at  a  fixed  location,  the 
different  multipaths  arrive  from  approximately  the  same 
azimuth,  but  differ  in  the  elevation  angle  of  arrival.  It  is 
assumed  that  the  range  of  elevation  angles  that  the 
multipaths  can  take  is  known  a  priori.  Note  however,  that 
we  do  not  need  to  know  the  values  of  the  elevation  angles 
or  the  number  of  multipaths  that  may  exist.  Denoting  6/ 

as  the  location  of  the  Ith  source,  and  0,  ,-  as  the  elevation 
angle  of  the  ilh  multipath,  the  received  data  from  the 
Ith  source  can  be  written  as  : 


where  rflflg(G)  =  [g1,  g2,-"  gtff  is  the 

]Vxl  complex  gain  vector  that  represents  the  gain  and 
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phase  response  of  the  array,  lp  denotes  the  total  number 

of  multipaths  from  the  l,h  source,  with  their  individual 
complex  amplitudes  and  phases  contained  in  S/ ,  and  is 

the  additive  noise  vector  with  covariance  <7 2 1 . 
Temporarily  ignoring  the  presence  of  noise,  the  above 
equation  can  be  written  as 

z,(n)=v,(n)Og{n)  (2) 

for  which 

v/=G[d0(Aj:  dO/0u  :•••  (3) 

where  n  denotes  the  sensor  index,  O  denotes  point-wise 
multiplication,  and  V;  represents  the  jVxl  replica  vector 

from  the  Ith  source.  Due  to  propagation  through  the 
ionosphere,  the  multipaths  arrive  only  in  a  small  subset  of 
angles  in  [0  a\.  Therefore  it  is  possible  to  represent  the 
multipaths  from  a  single  source  with  fewer  parameters 
than  the  number  of  sensors  N  .  Therefore  we  have 

v/=0,a/)  (4) 

where  0/  is  an  NxK  matrix  ( K  N  )  whose 
columns  correspond  to  the  dominant  eigenvectors  of  the 
correlation  matrix  Rv  =  Et  [v,  vj  ] ,  where  the  averaging 
is  done  over  the  range  of  elevation  angles  of  the 
multipaths  from  the  l,h  source. 

III.  CROSS  RELATION  TECHNIQUE 

We  now  incorporate  the  well  known  cross  relation 
technique  [8]  to  estimate  the  calibration  factors.  The 
multiplication  of  the  sequences  in  equation  (2)  can  be 
replaced  by  circular  convolution  in  the  frequency  domain 

as 

Zi{u)  =  g(u)®v,(u )  ,  (5) 

where  g(u)  denotes  the  DFT  of  the  sequence  of  sensor 
gains,  V;(w)  represents  the  DFT  of  the  received  replica 

vector,  and  ®  denotes  circular  convolution.  The  above 
equation  is  analogous  to  the  convolution  of  the  source 
signal  with  the  Ith  channel  in  the  blind  SIMO 
multichannel  identification  problem.  However,  it  is 
important  to  note  that  in  the  SIMO  identification  problem, 
the  source  is  “linearly”  convolved  with  the  channel.  A 


cross  relation  between  the  !'*  and  the  source  can  be 
expressed  as 

Zv  =  0  (6) 

where  v+=[vj  :  vj.]f  and  [v;]u=v/(w).  The 

matrix  Z  is  given  by  Z  =  [z,  :  Z;  ] ,  where  Z/  is  an 

NxN  circulant  matrix  formed  by  Z/ ,  which  represents 
the  received  data  in  the  spatial  frequency  domain. 

Using  the  reduced  parameter  expression  for  the 
replica  vector,  we  have 

v  =  0  a  (7) 

where  the  2 N  X  2 K  matrix  "  can  be  written  as 


The  columns  of  0  t  are  the  DFT’s  of  the  columns  of 
B  l  .In  order  to  estimate  the  vector  V  in  the  presence  of 
noise,  equation  (X)  can  be  reformulated  by  the 

minimization  ||Zv||2  as 

a  =  argmin;a  ^0  Z  Zeja  (9) 

subject  to  the  constraint  a^a  =  1 .  The  above 
minimization  can  be  achieved  by  calculating  the  eigen 

vector  with  minimum  eigen  value  of[^0  Z  Zo  J  .  Given 

a  ,  we  can  estimate  V  using  equation  (X)  as  V  =  0  a . 
Consequently,  a  least  squares  estimate  of  the  vector  of  the 
DFT  of  the  complex  sensor  gains  g  can  be  calculated  as 

g  =  (^)  (10) 

where  z  =  [zj  i  zjJ,  V  :  V;  j  and  ^  is  the 

circulant  form  of  V . 

IV  DIRECT  METHOD 

In  this  method,  the  gains  and  phases  are  estimated  in  a 
single  step  without  the  use  of  the  least  squares  technique 
used  above.  Using  Equations  (1)  and  (3)  we  have, 

(11) 
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where  Gis  assumed  to  be  invertible.  Since  Gis 
diagonal,  the  above  condition  which  is  equivalent  to  all 
the  complex  gains  having  their  magnitude  greater  than  0, 

_1 

is  satisfied  in  practice.  Further,  since  G  is  diagonal,  the 
above  equation  can  be  expressed  as 
Z,g  =  v;  (12) 


where  g  =  diag  (G  1 )  and  Z,  =  diag(z, ) .  Using 
equations  (11)  and  (12)  we  obtain 
Zc  =  0  (13) 

where 


while  c  =[g  i  a  :  5  ] ,Z;  =diag(z,) 

[g]  -  g(n)  ■  Clearly,  the  complex  gains  can  be  estimated 
by  estimating  the  null  space  of  Z .  In  presence  of  noise, 
we  can  estimate  C  by  the  minimization  of  ||Zc||2  as 


is  the  generalized  cosine  of  the  angle  between  the  true 
gain  g  and  the  estimated  gain  g .  Under  the  ideal  case 
wherein  the  estimated  gain  vector  is  a  complex  scalar 
multiple  of  the  true  gain,  the  AGD  takes  on  the  value  0. 
The  calibration  factors  were  estimated  using  the  Direct 
Method  and  the  beampattems  obtained  using  these 
coefficients  was  compared  with  the  case  when  the  sensor 
gains  are  known  exactly  and  with  an  uncalibrated  array. 
The  beampattem  for  the  uncalibrated,  calibrated  and  the 
gain  known  exactly  cases  is  shown  in  Figure  l.The  SNR 
for  this  simulation  was  set  at  20db.  Figure  2  shows  the 
results  obtained  with  the  SNR  set  at  40db.  As  can  be  seen, 
there  is  a  significant  improvement  in  the  sidelobe  levels 
with  an  increase  in  SNR,  which  is  intuitively  satisfying. 
The  AGD  was  computed  using  both  methods  as  a  function 
of  SNR,  with  50  trials  being  conducted  at  each  SNR 
value.  Figures  3  and  4  show  the  variation  of  AGD  with 
SNR  for  the  cross  relation  method  and  the  “Direct” 
method  respectively,  for  different  source  locations.  As  can 
be  seen,  both  methods  perform  slightly  better  when  the 
sources  are  close  to  broadside.  This  is  due  to  the  fact  that 
fewer  parameters  are  required  to  characterize  a  source 
closer  to  broadside  and  therefore  the  replica  vector  can  be 
more  accurately  represented. 


c  =  argmincc1'Z1'Zc  (15) 

subject  to  the  constraint  c^c  =  1 .  This  is  achieved  by 
calculating  the  minimum  eigenvector  of  the  matrix  ZfZ  . 

Since  the  gains  and  phases  are  estimated  in  a  single  step, 
this  method  was  found  to  be  much  faster  than  the  cross 
relation  based  method  from  simulations  performed. 


VI  CONCLUSIONS 

In  this  paper  we  present  two  techniques  methods  for  array 
gain  and  phase  calibration  using  multipath  sources  of 
opportunity.  Multichannel  blind  system  identification 
techniques  are  applied  to  yield  computationally  efficient 
solutions.  Simulation  results  demonstrate  the  potential  of 
the  proposed  algorithms. 


V  SIMULATION  RESULTS 

Simulations  were  performed  using  a  44  element  uniformly 
spaced  linear  array  (ULA)  with  inter-element  spacing  of 

^2 .  The  frequency  of  the  sources  was  chosen  to  be 

10MHz.  In  all  simulations,  4  multipaths  were  assumed  to 
exist  from  each  source  to  the  sensor  array.  The  complex 
sensor  gains  were  chosen  to  be  i.i.d  complex  gaussian 
random  numbers.  Since  the  gains  and  phases  are  estimated 
only  up  to  a  complex  multiplicative  constant,  the 
performance  metric  that  we  use  is  the  array  gain 
degradation  (AGD)  which  is  defined  as 
AGD  =  -£{2Ologcos(0)} ,  where 
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Figure  :  Beam  pattern  for  Direct  Method  20  dB  SNR 
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Figure  2 
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3:  Array  Gain  Degradation 
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Figure  4:  Array  Gain  Degradation  vs.  SNR  for  Direct 
Calibration  Method 
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ABSTRACT 

In  future  wireless  telecommunication  systems,  the  needs  in 
terms  of  capacity  will  lead  the  operators  to  make  simultane¬ 
ous  communications  in  the  same  time  at  the  same  frequency. 
When  the  system  benefits  from  diversity,  for  example  chan¬ 
nel  diversity  (spatial  separation  of  emitters)  or  code  diver¬ 
sity  (CDMA,  space-time  coding),  the  transmission  may  be 
assured  with  reasonable  quality.  Classical  techniques  of 
channel  estimation  for  the  single  emission  case  do  not  of¬ 
fer  sufficient  performances  for  the  multi-emission  case. 

We  propose  a  cooperative  maximum  likelihood  (ML)  chan¬ 
nel  estimator  adapted  to  the  emitted  signals  in  the  multi- 
emitter  (multi-user)  context.  This  estimator  relies  on  the 
hypothesis  that  the  propagation  channel  is  specular.  The 
Cramer  Rao  bound  (CRB)  is  derived  and  compared  to  the 
performances  of  the  proposed  ML  estimator.  The  empirical 
performances  from  Monte-Carlo  simulations  show  that  this 
estimator  is  efficient  at  high  SNR. 

1.  INTRODUCTION 

The  needs  in  terms  of  wireless  communication  capacity  are 
increasing  dramatically  with  video  and  music  demand.  On 
the  other  hand,  the  released  frequencies  are  still  limited.  As 
it  has  recently  been  proved,  the  capacity  of  a  MIMO  (multi¬ 
ple  input  multiple  output)  system  in  increased  compared  to 
a  classical  single  input,  single  output  system  [6].  To  meet 
these  capacities,  accurate  MIMO  channel  estimation  tech¬ 
niques  should  be  used  for  the  demodulator  to  perform  well. 
We  propose  a  cooperative  maximum  likelihood  (ML)  chan¬ 
nel  estimator  taking  into  account  the  knowledge  of  the  emit¬ 
ted  signals  in  the  multi-emitter  (multi-user)  context.  This 
estimator  relies  on  the  hypothesis  that  the  propagation  chan¬ 
nel  is  specular.  The  Cramer  Rao  bound  (CRB)  is  derived 
and  compared  to  the  performances  of  the  proposed  ML  esti¬ 
mator.  The  empirical  performances  from  Monte-Carlo  sim¬ 
ulations  show  that  this  estimator  is  efficient  at  high  SNR. 
The  proposed  estimator  uses  multiple  sensors  at  the  recep¬ 
tion.  We  suppose  that  the  antenna  array  is  not  calibrated. 
If  the  antenna  is  calibrated,  the  directions  of  arrival  may  be 


estimated  and  the  modified  estimator  may  be  found  in  [4], 
A  similar  work  in  the  radar  context  for  the  single  emission 
case  may  be  found  in  [2], 

For  CDMA  systems,  a  typical  simultaneous  resource 
sharing  system,  the  studies  performed  on  the  demodulators 
is  much  larger  than  the  works  that  have  been  done  concern¬ 
ing  the  channel  estimation  ([10],  [7]  and  [1]). 

The  single  user  techniques  [11]  are  applicable  in  the 
UMTS  (CDMA)  systems  when  the  number  of  users  is  small 
but  rapidly  degrade  when  the  number  of  users  increase. 
These  single  user  techniques  have  the  same  weakness  that 
the  RAKE  receiver  has  when  multiple  users  transmit  in  the 
same  cell. 

The  work  proposed  in  [8]  estimates  the  propagation  de¬ 
lays  using  a  bank  of  non  coherent  detectors.  The  results  are 
proposed  in  multi-path  propagation  channels,  with  Doppler 
and  inter-cell  interference.  In  [9]  the  authors  propose  an  op¬ 
timal  decision  rule  based  on  the  output  of  these  correlators. 
These  techniques  are  single  user  based  and  thus  are  limited 
in  the  multi-user  case. 

Many  proposed  estimators  based  on  the  ML  ([10],  [7] 
and  [1])  use  rectangular  pulse  shape  filters  and  are  thus  un¬ 
fortunately  inapplicable  when  filters  longer  than  a  chip  are 
used  as  in  real  systems. 

In  the  following,  we  first  present  the  signal  model  for  a 
multi-emission  system  with  specular  propagation  channels. 
Then  in  section  3  the  proposed  estimator  is  developed  and 
the  theoretical  bounds  are  derived.  At  last,  in  section  4, 
we  will  propose  some  simulated  results  and  compare  them 
to  the  theoretical  bounds.  These  performances  will  be  fol¬ 
lowed  by  the  conclusion  and  some  perspectives. 

2.  SIGNAL  MODEL 

We  suppose  that  several  emitters  (users)  are  transmitting  si¬ 
multaneously  at  the  same  frequency.  Each  emitter  u  is  sup¬ 
posed  to  transmit  a  known  signal  (pilot  signal)  su(t)  as  it  is 
done  in  the  UMTS  FDD  norm.  This  pilot  signal  is  generally 
used  to  estimate  the  propagation  channel  so  that  the  demod- 
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ulator  may  estimate  the  transmitted  symbols.  To  simplify 
the  presentation,  we  will  consider  that  the  received  signal 
is  only  composed  of  the  pilot  signal,  the  data  signals  being 
considered  as  additive  Gaussian  noise.  As  supposed  previ¬ 
ously,  the  propagation  channel  is  considered  to  be  specular, 
meaning  that  it  may  be  written  as : 

p" 

h“(*)  =  £h”«5(f-r;) 

p=i 

where  P"  is  the  number  of  paths  of  the  propagation  channel 
of  user  u,  h"  and  t”  are  respectively  the  vector  of  the  re¬ 
sponses  of  the  antenna  and  the  associated  delay  to  the  path 
p  of  user  u.  S(t.)  is  the  Dirac  function. 

The  received  signal  x(f)  is  proposed  in  equation  ( 1 ). 

u  P” 

x(*)  =  EEhX(*-TP  )+!>(*)  (1> 

v=]  p= 1 

where  :  U  is  the  number  of  users,  is  the  known  signal 
of  user  u  and  b(f)  is  the  noise  vector  at  time  t. 

This  signal  is  sampled  every  Tc  on  a  period  t  =  [Te ,  NrTe] 
during  which  the  h“  are  considered  to  be  constant.  These 
Ne  samples  of  dimension  Ar  x  1  are  concatenated  in  a  vector 
Y  of  dimension  NeN  x  1  verifying  : 


Y  = 


x\ {Te),-  ■  •,zi(YeTe),--,.T/v(T,?), 

> - - '  ' - 

sensor  1 


•  •  -,xN(NcTe) 

s 

sensor  N 


where  Xi(nTe)  is  the  sample  n  of  the  sensor  i. 

The  vector  N  of  dimension  Ne  N  x  1  contains  the  con¬ 
catenation  of  the  noise  samples  : 


N 


-i  T 


bi{Te),  ■  ■  ■  MNeTe),-; bN{Te),-"  ,bN(NeTe) 

' - V - '  ' - V - ' 


sensor  1 


where  6,  (nTf)  is  the  noise  sample  n  on  sensor  i. 
The  signal  vector  Y  may  be  written  as : 


a  contains  the  responses  of  the  sensors  to  the  paths  of  the 
users  : 


a 


sensor  1 


[h 


T 

1  5  ' 


,h 


ArJ 


,h 


v 

Pv  ,N 


The  modelisation  of  the  received  signals  in  equation  (2) 
is  linear  in  the  nuisance  parameters  :  a  and  the  noise  N. 
As  the  noise  is  considered  as  Gaussian,  temporally  and  spa¬ 
tially  white,  the  log-likelihood  of  the  received  signal  can  be 
easily  deduced. 


3.  ML  ESTIMATOR  AND  CRAMER  RAO  BOUND 


3.1.  ML  Estimator 


We  consider  here  that  the  complex  amplitudes  a  are  un¬ 
known  but  deterministic.  The  signals  used  in  T'(r)  are  sup¬ 
posed  to  be  known  but  parametrized  by  the  variables  r  to 
be  estimated.  This  leads  to  a  model  where  only  the  noise 
N  is  random,  with  Gaussian  components  and  thus  the  log- 
likelihood  is  given  by  : 

L(Y|cr2,a,r)  =  -NvN\og(ira2)  — \  ||Y  —  »I>(Ta)||” 

(3) 

The  following  of  this  chapter  is  dedicated  to  the  estimation 
of  the  parameters  a2,  a  and  r.  Recall  that  the  antenna  is 
not  calibrated  and  thus  the  DOAs  are  not  estimated. 

We  may  determine  the  analytical  expressions  of  estima¬ 
tors  for  the  complex  amplitudes  a  and  for  the  power  of  the 
noise  a2,  parametrized  by  the  vector  of  delays  r.  These 
estimators  are  obtained  by  nulling  the  derives  of  the  log- 
likelihood  in  a  and  in  a2.  We  get : 


G~  — 


l 

NeN 


II Y  -  ^(r)a||'2 


and 


Y  =  ^(r)a  +  N  (2) 

where  the  matrix  (NeN  x  NP)  'h  contains  the  samples  of 
the  s “(r“)  as  follows : 

'E(t)  =  In  ®  S(t) 

where  5(r)  is  the  (Ne  x  P )  matrix  verifying  : 


a  =  (^t(r)^(r))  1  4-t(r)Y  =  *#(r)Y 


(4) 


By  replacing  a  and  u2  by  their  estimators  the  log-likelihood 
simplifies  and  the  estimator  of  r  is  given  by  : 


r  =  argmin 


Y  - 


*(r)  (^t(r)*(r))  '«-t(r)Y 


S(r)  -  [s1  {t~i  )■>'''  , s1  (Tpt),  ■  ■  ■  , s(rpL, )] 

with 

s u(t£)  =  [  s“(Te  -  r“)  •  •  •  s"(NeTe  -  Tp)  ] T 


Let  (r)  be  the  projector  on  the  image  built  by  the  rows 
of  ^(r)  : 

n^(r)  =  I-  ^(r)  ^f(r)$(r)j  &(t) 
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The  estimate  of  r  is  given  by  : 


(8) 


r  = 


arg  mm 


ni(r)Y 


arg  min  tr  ^YtIl4(r)Yj 


This  estimator  will  be  compared  to  the  theoretical  Cramer 
Rao  bound.  To  implement  the  ML  estimator,  a  Gauss  new¬ 
ton  algorithm  has  been  used  (for  more  details  see  [4]). 


3.2.  Cramer  Rao  Bound 

In  this  section  we  will  determine  the  statistical  performances 
of  the  ML  estimator  in  terms  of  estimation  variance.  The 
Cramer-Rao  bounds  are  calculated  and  give  us  the  mini¬ 
mum  reachable  variance  for  an  un-biased  estimator  at  high 
SNR.  Once  these  bounds  are  given,  they  are  compared  to 
the  Monte-Carlo  simulations  giving  empirical  estimations 
of  the  variance  of  the  estimator. 

The  Cramer-Rao  bounds  (CRB)  give  the  inferior  vari¬ 
ance  limit  for  an  un-biased  estimator.  These  limits  are  given 
by: 

var{ q)  >  CRB  =  J"1 

where  Jq  is  the  Fisher  information  matrix  : 


Ja  =  '  (^f(p)  •  ^(P)) 


[Jar4',i  =  7*  ■  (at  '  ‘  *(P)  •  e*) 


(9) 


with 


0  1  0 

i—1  i  i+1 


o 

L 


and  where 


Hd  =  [diag( hi),  •  •  •  ,  diag(hN)Y 


D  *(t)  = 

Dfi(r)  = 

For  the  terms  relative  to  the  delay  parameters : 

JT  =  ^  •  R  (h],  ■  •  D*  •  Hd)  (10) 

Thus  the  Fisher  information  matrix  becomes  : 


(Ijv  ®  Ds) 

'9s\(t)  dspujr)' 
dr  ’  ’  dr 


Jq  =  E 


9L(q)  (dL(g)y] 
dq  \  dq  J  j’ 


(5) 


and  where  q  is  the  complex  vector  of  the  wanted  parame¬ 
ters  : 


a,  a*  and  a2  are  nuisance  parameters  and  r  are  the  useful 
(delay)  parameters. 

The  choice  of  the  complex  notation  in  the  Fisher  ma¬ 
trix  is  motivated  by  the  simplifications  it  introduces  in  the 
calculation  of  the  inverse  of  the  bloc  [Jq1]  on  the  delay 
parameters.  Such  an  approach  is  much  simpler  than  an  ap¬ 
proach  isolating  the  real  and  imaginary  part  as  is  done  in 

[3]. 

The  matrix  Jq  has  the  following  bloc  structure  : 


J (T2 

OixW 

OlxW 

OlxAT 

X 

£ 

o 

Jo 

0  NxN 

J  oltt 

X 

O 

OnxN 

Jo” 

Jck*tt 

X 

O 

[JarH* 

[JQ.Tr]f 

Jr 

Using  the  equalities  JaTr  =  (J„.tt)‘  and  Ja  =  (J„.)*  , 
and  the  bloc  inversion  lemma,  we  get : 


BCR(r)  =  [J-]]x  =  [jT  -23?{j^TTJ-ija,Tx}] 
that  simplifies  by : 

BCR(t)  =  y  •  [«{Hi-D^-ni-D»-Hd}] 


-l 


-l 


This  general  expression  of  the  CRB  relative  to  the  parame¬ 
ters  r  gives  in  the  single  path  scenario  the  classical  result. 
In  deed,  in  this  case,  the  matrix  Jq  becomes  diagonal  and 

P;‘]T  =  ■>?'• 


Jq-2 

[J<72C<T]t 

[J(T2QfT]^ 
-  [JCr2T2']t 


J<72«T 

Jq 

[JaaT] 

[Jqt^ 


J<t2c*t 

J  olcxt 
[Ja'rr]* 


J  ct2tt 

J  <xtt 
Jq*TT 

JT 


where  the  non  null  blocs  are  given  by  : 


J fj  2 


Ne  ■  N 


4.  PERFORMANCES  AND  FURTHER 

(6)  DEVELOPMENTS 

We  compare  on  figure  1  the  performances  of  the  ML  to  the 
CRB.  The  signals  used  for  the  simulations  are  UMTS-FDD 
like  wave  forms  (CDMA)  where  only  the  pilot  channel  is 
generated  and  known  for  all  the  users.  This  means  that 

(7)  the  received  signals  do  not  contain  any  data  signals.  The 
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Figure  1 :  CRB  and  ML  for  the  delays 


spreading  factor  is  set  to  32  and  the  observations  are  of  320 
samples  long  with  a  2  sensor  reception  antenna  in  a  cell 
containing  4  users.  Each  users  signal  propagates  through  a 
2  path  channel.  The  optimization  of  the  ML  has  been  done 
with  a  Gauss-Newton  algorithm.  Note  that  the  simulation  is 
made  in  a  typical  reduced  UMTS-FDD  scenario.  The  sim¬ 
ulations  are  indexed  in  abscissa  in  Eb  /No  representing  the 
energy  bit  over  the  noise  power. 

We  notice,  on  the  figure,  that  when  the  performances 
(variance  of  the  estimation  on  the  delay  parameter)  of  the 
estimation  of  one  of  the  paths  for  one  of  the  users  is  de¬ 
graded,  the  performances  of  the  other  path  is  also  degraded. 
On  the  other  hand,  the  performances  of  the  other  users  are 
not  affected  by  the  performance  loss  of  the  estimated  delay 
of  the  other  degraded  user. 

For  high  SNR  scenario,  the  performances  of  the  ML  and 
the  CRB  are  the  same  and  thus,  the  ML  estimator  is  effi¬ 
cient.  A  more  detailed  demonstration  showing  that  the  ML 
variance  is  equal  to  the  CRB  is  proposed  in  [4]  and  in  [5] 
for  high  SNR. 

At  low  SNRs,  the  CRB  and  the  ML  algorithm  do  not 
match.  The  SNR  for  which  the  ML  and  the  CRB  separate  is 
called  the  SNR  threshold.  In  future  work,  the  SNR  threshold 
will  be  studied  and  tighter  limits  will  be  obtained  using  a 
Bayesian  bound  like  the  Ziv-Zakai'  bound.  These  bounds 
offer  more  accurate  information  on  the  lower  bounds  for 
channel  estimation  schemes  at  low  SNR. 
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ABSTRACT 

In  this  paper,  we  demonstrate  the  use  of  support  vector  (SV) 
techniques  for  the  binary  classification  of  nonstationary  si¬ 
nusoidal  signals  with  quadratic  phase.  We  briefly  describe 
the  theory  underpinning  SV  classification,  and  introduce  the 
Cohen’s  group  time-frequency  representation,  which  is  used 
to  process  the  non-stationary  signals  so  as  to  define  the  clas¬ 
sifier  input  space.  We  show  that  the  SV  classifier  outper¬ 
forms  alternative  classification  methods  on  this  processed 
data. 

1.  INTRODUCTION 

The  classification  of  nonstationary  signals  is  a  difficult  and 
much  studied  problem.  On  one  hand,  the  nonstationarity 
precludes  classification  in  the  time  or  frequency  domain; 
on  the  other  hand,  nonparametric  representations  such  as 
time-frequency  or  time-scale  representations,  while  suited 
to  nonstationary  signals,  have  high  dimension.  Time-Frequency 
Representations  (TFRs)  and  distance  measures  adapted  to 
their  comparison  have  previously  been  used  to  classify  non- 
stationary  signals  [1,  2,  6],  however  the  decision  rules  cho¬ 
sen  in  these  studies  limit  the  performance  of  these  classifi¬ 
cation  algorithms. 

Support  vector  machines  (SVMs)  [10]  provide  efficient 
and  powerful  classification  algorithms,  which  are  capable  of 
dealing  with  high  dimensional  input  features,  and  with  theo¬ 
retical  bounds  on  the  generalisation  error  and  sparseness  of 
the  solution  provided  by  statistical  learning  theory  [12, 10]. 
Classifiers  based  on  SVMs  have  few  free  parameters  requir¬ 
ing  tuning,  are  simple  to  implement,  and  are  trained  through 
optimisation  of  a  convex,  quadratic  cost  function,  which  en¬ 
sures  the  uniqueness  of  the  SVM  solution.  Furthermore, 
SVM  based  solutions  are  sparse  in  the  training  data,  and  are 
defined  only  by  the  most  “informative”  training  points. 


In  this  paper,  we  propose  to  use  a  support  vector  ma¬ 
chine  for  binary  classification  of  the  TFRs  of  nonstationary 
signals.  In  Section  2,  we  review  support  vector  classifiers. 
In  Section  3,  we  propose  a  classifier  implementation  based 
on  Cohen’s  group  TFRs,  and  in  Section  4  we  compare  the 
classification  results  obtained  with  the  SVM-TFR  approach 
to  those  found  using  other  classification  methods. 

2.  SUPPORT  VECTOR  CLASSIFICATION 

We  first  describe  how  support  vector  machines  may  be  used 
in  binary  classification,  using  the  r'-SV  procedure.  The  re¬ 
sults  in  this  section  are  derived  in  Scholkopf  et  al.  [9],  and 
are  also  described  in  detail  in  Scholkopf  and  Smola  [10]. 
Assume  a  sample  of  N  labeled  training  points, 

2  =  ((xi,2/iV-‘-  ,  (xjv,2/jv))  e  (A  x  y)N  , 

in  which  x;  €  X,  where  X  is  the  input  space,  and  yi  E 
where  y  is  the  label  space.  For  our  purposes,  we  define 
y  =  {  —  1,1},  which  corresponds  to  a  two  class  classifica¬ 
tion  problem.  We  seek  to  determine  a  function 

i>-.x  ->y 

X  V>(x), 

that  best  predicts  the  label  y  for  a  vector  x.  Assuming  that 
random  variable  pairs  (x,  y)  are  generated  i.i.d  according  to 
a  distribution  Px  y,  the  optimal  predicted  class  label  for  an 
input  x  is 

V’(x)  =  argmaxPy(y|x  =  x). 
y 

Since  we  do  not  know  the  mapping  we  define  a  leam- 


0-7803-701 1-2/01/S10.00  ©2001  IEEE 


305 


ing  algorithm  A, 

OG 

A:  U  (X,y)N  ->  Ti 

A'=l 

z  •->  4’z  (•) , 

within  a  class  %  C  yx  (here  yx  refers  to  the  set  of  func¬ 
tions  mapping  from  X  to  30,  which  we  call  the  hypothesis 
space,  that  is  flexible  enough  to  model  a  wide  range  of  deci¬ 
sion  boundaries.  We  next  define  a feature  space  T,  endowed 
with  an  inner  product  1  (■,  -)t,  and  a  mapping  from  X  to  T, 

x  i-»  $(x). 

Let  us  restrict  Ti  to  functions  of  the  form 


among  others.  The  present  study  is  confined  to  the  case 
of  soft  margin  loss,  which  has  been  used  successfully  with 
support  vector  methods  in  a  wide  variety  of  classification 
problems  [10]. 

In  practice,  equation  (3)  cannot  readily  be  solved,  as  we 
do  not  usually  know  the  distribution  Px  y.  Minimising  the 
empirical  risk  alone  does  not  take  into  account  other  factors, 
such  as  the  complexity  of  the  classifying  function,  and  can 
therefore  result  in  overfitting  [10,  12], 

We  now  describe  the  optimisation  problem  to  be  un¬ 
dertaken  in  finding  /z(x).  All  support  vector  classification 
methods  involve  the  minimisation  of  a  regularised  risk  func¬ 
tional,  which  represents  a  tradeoff  between  classifier  com¬ 
plexity  and  training  error  (the  latter  is  determined  by  the 
cost  functional).  In  the  case  of  the  u-SW  method,  the  regu¬ 
larised  risk  i?rcg(/z  (•), z)  at  optimum  is  given  by 


Ti  :=  {x  sign  ((<fr(x),  w)  +  b)  |w  £  T ,  b  £  R}  . 

We  can  then  define  a  function  fz  ( x )  in  M* ,  such  that  ij:z  (•)  = 
A(z)  =  sign(/z(-));  thus 

fz{x)  =  ($(x),w)  +  b,  (1) 

and  the  problem  of  finding  a  nonlinear  decision  boundary 
in  X  has  been  transformed  into  a  problem  of  finding  the 
optimal  hyperplane  in  T  separating  the  two  classes,  where 
this  hyperplane  is  parametrised  by  (w,  b). 

The  mapping  #(■)  need  never  be  computed  explicitly; 
instead,  we  use  the  fact  that  if  T  is  the  reproducing  kernel 
Hilbert  space  induced  by  k (■,■),  then 

(*(xO,*(xj))  =  k(xj,Xj). 

The  latter  requirement  is  met  for  kernels  fulfilling  the  Mer¬ 
cer  conditions  [10].  These  conditions  are  satisfied  for  a  wide 
range  of  kernels,  including  Gaussian  radial  basis  functions, 

k(xi,Xj)  =  exp  ^  ^ )  •  (2) 


An  estimate  fz(-)  associated  with  the  loss  c  (x,  y,  fz  (•))  is 
attained  by  minimising  the  risk  R  (gz  (•)),  i.e. 


/*(•)  =  argmin 

gR-)eT 


R{9z  (•))  =  Ex,y  [c  (x,  y,gz  (x))] 


•  (3) 


Possible  loss  functions  include  the  soft  margin  loss  [3,  5], 


c{x,y,gz{x))  = 


if  ygz{x)  >  P, 


(4) 


-  y  gz  (x)  otherwise, 
and  the  logistic  regression  loss  [8], 

c(x,y,gz{x))  =  log(l  +  exp(-yflz(x))) ,  (5) 

1  We  omit  the  inner  product  subscript  in  the  subsequent  discussion,  un¬ 
less  the  inner  product  is  taken  in  a  space  other  than  T. 


min  [/?reg(/z(-),  ^)]  = 
fR-)€T 


min 

w  .b,p 


1 

2 


W||2  -Vp  +  RZmptfzi&z)  , 


(6) 


where  we  use  the  soft  margin  loss  from  equation  (4)  in  the 
empirical  risk; 


Romp  (fz{-),z)  =  ±£c{xilVhM.)) 
i=  1 

=  jit*. 


in  which 

ii  =max{0,  p  y  f z  (x) } . 

All  training  points  (x,. y,)  for  which  y,  /2(x,)  <  p  are 
known  as  support  vectors-,  it  is  only  these  points  that  de¬ 
termine  fz(-)-  The  role  of  the  term  v  in  equation  (6)  is 
described  in  the  following  theorem,  from  Scholkopf  et  al. 
[9]. 

Theorem  1  The  following  results  hold  only  for  solutions  to 
the  optimisation  problem  in  equation  (6)  for  which  p  >  0. 

1.  v  is  an  upper  bound  on  the  fraction  of  training  points 
for  which  y,  fz  (x,  )  <  p,  which  we  call  margin  errors. 

2.  p  is  a  lower  bound  on  the  fraction  of  training  points 
for  which  y,  /z(x,)  <  p(the  support  vectors). 

3.  Assume  a  data  set  z  generated  iid  according  to  Px.y> 
and  that  neither  Px(x|y  =  1)  nor  Px(x|y  =  —1)  con¬ 
tains  any  discrete  component.  Then,  given  a  kernel 
k { ■ .  •)  that  is  analytic  and  non-constant,  with  proba¬ 
bility  1,  asymptotically,  v  is  equal  to  the  fraction  of 
support  vectors  and  the  fraction  of  margin  errors. 
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It  can  be  shown  [9]  that  the  component  w  in  equation 
(1)  is  a  linear  combination  of  the  mapped  training  points, 

N 

W  =  £«*,*(*,). 
i=  1 

and  that  solving  equation  (6)  is  equivalent  to  finding 

(  1  N  \ 
max  I  — -  ^2  aiajyiyjk.(xt:Xj)  Fig.  1.  Support  of  the  noise-free  TFRs  (the  gray  areas  repre- 

“  y  ^  i,j= i  J  sent  the  possible  instantaneous  frequencies  for  each  class). 

subject  to 


N 

Y2  Viai  =  0, 

2=1 

N 

ai  -  V' 

i= 1 

There  exist  a  number  of  methods  that  can  be  used  to 
solve  this  quadratic  programming  problem.  Our  results  were 
obtained  using  the  LOQO  algorithm  in  Vanderbei  [11].  In 
the  case  of  large  training  sets,  data  decomposition  methods 
exist  to  speed  convergence;  see  e.g.  Chang  et  al.  [4].  The 
offset  b  and  soft  margin  loss  parameter  p  are  found  using 

Vj  «w,  $(xj))w  +b)  =  p  when  aj  € 

the  set  of  equations  thus  obtained  can  be  solved  via  linear 
least  squares. 


where  the  notation  N Cf(t,  f)  is  used  to  show  that  the  TFR 
is  normalised; 


N Cf(t,f) 


If  Cf(t,f)  dtdf 


(8) 


In  this  formulation,  the  input  space  X  defined  in  previous 
section  is  the  space  of  normalised  TFRs  (i.e.,  x  =  Ndf  (f ,  /)), 
which  depends  on  the  choice  of  the  TFR  kernel  <f>. 


4.  RESULTS 

We  now  apply  the  z^-SVR  algorithm  to  the  binary  classi¬ 
fication  of  chirp  signals,  and  compare  our  results  to  those 
obtained  previously  by  Davy  et  al.  [6,  7].  The  test  signals 
are  defined  as  the  sum  of  two  linear  chirps: 

x(k)  =  .4sin  [27r(oo  +  aifc)] 

+  B  sin  [27r(&o  +  b\  k  +  b2k2)\  (9) 

+  t(k)  ,  k  =  0, .. .  ,K  -  1, 


3.  KERNEL  DESIGN 

The  I'-SVM  classification  procedure  relies  on  the  choice  of 
a  kernel  suited  to  the  problem  at  hand.  In  Davy  et 
al.  [6],  a  nonstationary  signal  classification  algorithm  was 
introduced,  based  on  Cohen's  group  time-frequency  repre¬ 
sentations.  In  this  paper,  we  choose  a  t'-SVM  kernel  k(-,  ■) 
based  on  a  similar  approach. 

We  write  the  Cohen’s  group  time-frequency  representa¬ 
tion  of  s(t)  as  Cf  ( t ,  /)  (parametrised  by  its  TFR  kernel2  <f>). 
Given  two  signals  s(t)  and  s'(t),  the  Gaussian  radial  basis 
function  kernel  of  equation  (2)  then  becomes 

fc(x,  x')  = 

eXp~2^  / l\^f(tJ)-NCf,(tJ)\2dtdf  , 
_  (7) 

2  In  order  to  avoid  confusion  between  the  t'-SVM  kernel  and  the  TFR 
kernel,  the  latter  will  be  referred  to  as  the  TFR  kernel  at  all  times. 


where  the  e(k)  are  iid,  and  are  generated  by  a  zero  mean 
Gaussian  process  with  variance  of.  Each  test  signal  x(k) 
is  parametrized  by  9  =  (.4,  B,  a,  b,  of),  with  a  =  (o0,  oi) 
and  b  =  (bo,  b\,  62).  The  problem  consists  of  classifying  a 
given  signal  x(k)  into  one  of  the  two  following  classes: 

•  Class  wi :  p(b2)  ~  U(-^~: jj,  2(jc-i))’  where (7(a,b) 
is  the  uniform  distribution  on  (a,  b), 

•  Class  w2  :  p(b2)  ~  U(  afgrfj,  )• 

The  remaining  signal  parameters  are  identical  in  both  classes, 
i.e.  A  =  B  =  1,  oo,  bo  ~  U( 0,1),  a\  =  0.25  and  b\  = 
0.40.  The  support  of  the  noise-free  time-frequency  repre¬ 
sentation  for  signals  in  each  class  is  plotted  in  figure  1. 

The  f-SVM  algorithm  was  trained  using  100  signals, 
with  an  equal  number  of  examples  in  each  class.  We  spec¬ 
ified  a  kernel  width  of  a2  =  0.1  (see  equation  (7)),  and  set 
v  =  0.2.  A  radially  symmetric  Gaussian  TFR  kernel  <p  was 


307 


selected,  with  parameters  optimised  to  minimise  the  error 
rate  observed  on  the  test  data. 

To  measure  the  performance  of  the  algorithm,  a  total 
of  20000  randomly  generated  test  signals  were  used,  again 
divided  equally  between  the  two  classes  (note  that  the  train¬ 
ing  signals  did  not  form  part  of  the  test  set).  Table  1  shows 
the  average  error  over  these  test  signals,  compared  with  the 
average  obtained  over  the  same  number  of  test  signals  for 
alternative  classification  methods.  We  see  that  for  this  prob¬ 
lem,  the  zz-SVM  algorithm  achieves  the  lowest  error  rate. 


Classification  method 

Error  rate 

Wigner  distribution  [1] 

22.30  % 

Ambiguity  plane  [2] 

4.56  % 

Time-Frequency  [6] 

2.25  % 

MCMC  classification  [7] 

5.24  % 

SVM  classification  [this  paper] 

1.51  % 

Table  1.  Error  rates  for  the  classification  of  chirps,  using 
the  proposed  SVM  implementation  and  other  classifiers. 


5.  CONCLUSION 

In  this  study,  we  show  that  the  good  performance  of  SV 
classifiers  in  high  dimensions  allows  us  to  effectively  clas¬ 
sify  chirp  signals,  when  these  are  transformed  using  Co¬ 
hen's  group  time-frequency  kernels.  Additional  advantages 
of  the  S  V  classification  method  include  simplicity  of  imple¬ 
mentation,  relatively  low  computational  cost,  and  unique¬ 
ness  of  the  SVM  solution. 
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ABSTRACT 

This  paper  addresses  optimal  estimation  for  time  varying  autore¬ 
gressive  (TVAR)  models.  First,  we  propose  a  statistical  model  on 
the  time  evolution  of  the  frequencies,  moduli  and  real  poles  instead 
of  a  standard  model  on  the  AR  coefficients  as  it  makes  more  sense 
from  a  physical  viewpoint.  Second,  optimal  estimation  involves 
solving  a  complex  optimal  filtering  problem  which  does  not  ad¬ 
mit  any  closed-form  solution.  We  propose  a  new  particle  filtering 
scheme  which  is  an  improvement  over  the  so-called  auxiliary  par¬ 
ticle  filter.  The  hyperparameters  tuning  the  evolution  of  the  model 
parameters  are  also  estimated  on-line  so  as  to  robustify  the  model. 
Simulations  demonstrate  the  efficiency  of  both  our  model  and  al¬ 
gorithm. 

1.  INTRODUCTION 

Many  models  in  signal  processing  can  be  cast  in  a  state  space  form. 
In  most  applications,  prior  knowledge  of  the  system  is  also  avail¬ 
able.  This  knowledge  allows  us  to  adopt  a  Bayesian  approach;  that 
is,  to  combine  a  prior  distribution  for  the  unknown  quantities  with 
a  likelihood  function  relating  these  quantities  to  the  observations. 
Within  this  setting  one  performs  inference  on  the  unknown  state 
via  the  posterior  distribution.  Often,  the  observations  arrive  se¬ 
quentially  in  time  and  one  is  interested  in  estimating  recursively 
in  time  the  evolving  posterior  distribution.  This  problem  is  known 
as  the  Bayesian  or  optimal  filtering  problem  [1].  In  many  realistic 
problems,  state  space  models  must  include  elements  of  non  lin¬ 
earity  and  non  Gaussianity  that  preclude  a  closed  form  expression 
for  the  optimal  filter.  For  over  thirty  years,  many  approximation 
schemes,  such  as  the  extended  Kalman  filter,  have  been  proposed 
to  tackle  this  problem;  see  [1].  Unfortunately,  in  many  cases,  these 
suboptimal  methods  are  unreliable. 

Following  the  seminal  paper  by  Gordon,  Salmond  and  Smith 
introducing  the  bootstrap  filter/SIR  [5],  there  has  been  a  surge  of 
interest  in  particle  filtering  methods  to  solve  the  optimal  filtering 
problem  numerically;  see  [3],  [4],  These  methods  are  Sequential 
Monte  Carlo  (SMC)  methods  that  utilize  a  large  number,  N,  of 
random  samples  (or  particles)  to  represent  the  posterior  probability 
distributions.  They  are  very  flexible  and  can  be  easily  applied  to 
nonlinear  and  non  Gaussian  dynamic  models. 

We  sum  up  here  the  contributions  of  our  paper:  we  first  pro¬ 
pose  an  original  model  for  TVAR.  It  relies  on  a  pole  type  parame¬ 


terization  of  the  problem  which  is  physically  sound,  versatile  and 
robust.  More  precisely,  our  model  takes  into  account  model  un¬ 
certainty:  the  order  of  the  TVAR  is  assumed  unknown  and  is  es¬ 
timated  on  line.  The  hyperparameters  which  might  influence  the 
results  are  also  part  of  the  inference  process,  therefore  robustifying 
the  model.  The  proposed  model  is  complex  and  requires  the  use 
of  state  of  the  art  particle  filtering  techniques.  We  introduce  here 
a  modification  of  the  auxiliary  particle  filtering  method  relying  on 
an  approximate  computation  of  the  ”one-step  ahead  likelihood”.  It 
is  shown  to  lead  to  substantial  improvements  in  simulation. 

2.  IMPROVED  AUXILIARY  PARTICLE  FILTERING 

2.1.  Problem  Statement 

Let  (fi,  F,  P )  be  a  probability  space  on  which  we  have  defined 
two  real  vector- valued  stochastic  processes  X  =  {X*,  t  G  N} 
and  Y  =  {Y<,  t  G  TV  }.  The  process  X  is  usually  called  the 
signal  process  and  the  process  Y  is  called  the  observation  pro¬ 
cess.  Let  Rn*  and  R,ly  be  the  dimensions  of  the  state  space  of 
X  and  Y.  The  signal  process  X  is  a  Markov  process  with  ini¬ 
tial  density  p(x o)  and  probability  transition  density  p(xt\xt-i). 
Th ^observations  are  independent  conditional  upon  X  and  have 
marginal  density  g(yt\xt). 

For  p  <  q  and  any  sequence  zt,  we  denote 
zp-.q  =  {zp,  zp+ 1, . . .  ,  zq).  Bayes’  theorem  allows  us  to  propa¬ 
gate  over  time  the  joint  posterior  distribution  p(xo-.t\yut) 

p(xo-.t\yi:t)  oc  g(yt\xt)p{xt\xt-i)p(xo:t-i\yi-.t-i) 

and  the  marginal  filtering  distribution 

p(xt\yi:t)  oc  g{yt\xt)  J p{xt\xt-i)p(xt~i\yi-.t-i)dxt 

where  oc  denotes  “proportional  to”.  Except  in  very  special  cases, 
these  densities  do  not  admit  any  closed-form  expression  and  some 
numerical  methods  are  required  to  approximate  them. 

2.2.  Particle  Filtering  Method 

Particle  filtering  methods  are  loosely  speaking  a  set  of  sampling/ 
resampling  methods.  These  are  recursive  algorithms  which  pro- 
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duce,  at  each  time  t ,  a  cloud  of  particles  j  j  whose  empir¬ 
ical  measure 


N 

PN (dXO:t\yi:t)  =  Y (i)(dX0:t),w\')  >  0,  ^  wj ' '  =  1 

- - '  -T0  :i  - - 


closely  “follows”  the  distribution 

P(dXO:t\yi:t)  =  p(x0:l\yi:l)dX0:t 

2.2.1.  Sequential  Importance  Sampling/Resampling 


q(xt\yi:t,xo:t-i).  It  is  easy  to  check  that  the  weights  must  then 
satisfy 

w,(0  ^  «?(;/, |xi")p(.ri,,k!L)1) 

p(yt\xit\)q(zit'>\yi-.t,zo-]-i) 

for  this  procedure  to  be  statistically  consistent. 

This  method  will  only  work  well  when  the  approximation  of 

p(y,  |x|'J, )  is  correct.  One  has 

p(yt\xt-i)  -  ( p(yi\x,)p(xi\xi-i)dxi. 


At  time  t  —  1,  assume  one  has  the  following  approximation  of 

P(dxo-.t-i\yi:t-i) 


PN  (dxo:t-i\yui-i) 


N 


V'  S  (,')  (d.X0:f-l). 

Z '  X0:t-1 


We  extend  the  current  paths  by  sampling 

x^  ~  q{xt\yi-.t,xo-.t-i).  Then  using  the  importance  sampling 
identity. 


P(xo-.t\yi:t)  OC 
P(x0:t-l\yi:t-l) 


g(.yt\xt)p{xt\x,-i) 


q(xt\yi:i,x0:t-i) 


one  gets  for  the  new  weights 


w 


(0 


g(WtkS'))p(^i')|a:i,-)i) 


Then  one  uses  a  resampling  step:  particles  with  high  weights 
are  copied  several  times  whereas  particles  with  low  weights 
are  discarded.  After  this  resampling  step,  the  weights  are  reset  to 
AT1. 

In  the  case  where  one  uses  the  “optimal”  importance  distribu¬ 
tion  [3] 


In  [6],  it  is  suggested  to  approximate  this  integral  by  p(y<  |.rf  = 
p(xt-i))  where  p{xt- 1)  is  the  mode  or  median  of  p(x,|x,_i). 
This  approximation  might  be  very  poor  ifp(x,|xf_i )  is  rather  dif¬ 
fuse  and  p{yt\xt)  varies  a  lot  over  the  prior  p(xt|x/_i ).  It  would 
be  of  course  possible  to  approximate  p(yt\xt-\)  using  a  (second 
stage)  Monte  Carlo  method  but  this  would  be  highly  computation¬ 
ally  intensive. 

It  is  possible  to  approximate  this  expression  using  numerical 
methods.  We  propose  here  a  simple  though  efficient  deterministic 
method  known  as  the  unscented  transform.  This  has  the  advantage 
of  computing  both  p{y, |x<_i)  and  q(rt\yut,Xo.t-i),  see  [7,  8] 
for  details. 

Improved  Auxiliary  particle  filtering  algorithm 

At  time  t  =  0,  Step  0: 

Initialization 

•  For  i  =  1  ,...,jV,  sample  x ~  p(x0)  and  set  /  =  1. 

At  time  /  >  1,  Step  1:  Auxiliary  variable  resampling  step 

•  For  i  =  1, ...,  JV,  compute  A{!)  as 

Aj’’  ex  w{t,2lp{yt\x\'ll)  ,  X>!''  =  1  (1) 

1=1 


,  ,  ,  N  g(yi\xt)p(xt\x,-i) 

q(xt\yut,xo-.t-i)  =p(xt\yut,xt-i )  =  ^  - , 

then  it  is  easy  to  see  that 

Wt0  <*■  P(Vt\x\- 1)- 

That  is  the  weight  is  independent  of  x\z) .  This  suggests  that 
resampling  should  be  performed  before  sampling  jx|  0  j  as  it 
will  obviously  lead  to  an  increased  number  of  distinct  particles. 
Unfortunately,  this  method  cannot  be  used  for  most  models  as 
p{yt\Xi-i)  does  not  admit  a  closed-form  expression. 

2.2.2.  An  improved  auxiliary  particle  filtering  method 

We  present  here  an  improved  version  of  the  Auxiliary  Particle  Fil¬ 
tering  method  [6].  In  the  APF,  one  approximates  the  “predictive 

likelihoods”  ■^(j/ilxlllj)  |  by,  say,  Then 

one  resamples  the  particles  jr (,/)_,  j  w.  r.  t.  w\'\p{yt 

the  aim  being  to  boost  the  number  of  particles  in  useful  regions 

f  (i)\N 

of  the  state  space.  Then,  if  we  sample  <  x,  >  according  to 


where  p(yi\x,'li)  is  computed  using  the  unscented 
approximation. 

•  Multiply/Discard  particles  jx^,  j  with  respect  to 
high/low  importance  weights  Aj0  to  obtain  N  particles 


Step  2:  Importance  sampling  step 

•  For  i  =  1,  ...,N  ,  use  the  unscented  Kalman  filter  to 
compute  x\l)  and  P^_x  that  are  respectively  the 

estimates  of  E(xt\y]:i1x\lll)  and  Cov(x,|?yi:(,  x^) 

•  For«  =  1,..., Ar,samplex{!)  ~g(x(|t/i:i,X£l!J_1)  where 
the  importance  distribution  is 

g(x,|i/i:,,x[)‘:!_,)  =  AT(x,;  x^.p'l,) 

•  Update  the  importance  weights  as 

p(yi\x\'l1)q{x(i')\yi:t,x0:i-i) 
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3.  APPLICATION  TO  BAYESIAN  TIME- VARYING 
SPECTRAL  ANALYSIS 


the  model  so  as  to  define  an  extended  Bayesian  model,  that  in¬ 
cludes  the  hyperparameters.  The  above  model  becomes: 


Time-Varying  models  have  received  much  attention  as  tools  for 
nonstationary  signals  analysis  [9,  10].  Time- Varying  AR  models 
consist  in  the  following  recursive  process: 

Vt  =  ai,tyt-i  +  a2,tj/t-2  ■  ■  •  ap,tyt-p  +  vt  (2) 

where  a t  =  [oi,t,  a2,t,  •  ■  ■  ,  aP,t]T  is  the  vector  of  AR  coeffi¬ 
cients  at  time  t.,  vt  is  a  zero-mean  Gaussian  white  noise  of  variance 
R,  and  p  is  the  AR  model  order.  Denote  y t  =  [yt-i  yt-2  yt-P]T , 
then  Eq.  (2)  can  be  written  as: 


c  r 

Pt,pt 

~  A(Pf,Pt|P*-i)Pt-i) 

(7) 

Xf 

~  p(xt\xt-i,pct,pi,Qt-i) 

(8) 

Qf 

~  p(Qt|Q/-i,Pf,Pf) 

(9) 

Rt 

~  p(Rt\Rt-i) 

(10) 

yt 

~  g(yt\xt,pct,pTt,Rt,yi-.t-i) 

(11) 

where  Eq.  (8)  is  similar  to  Eq.  (4),  the  likelihood  given  Eq.  (11) 
is  computed  in  the  same  way  as  in  Eq.  (5).  Eq.’s  (9)  and  (10) 
correspond  to  a  random  walk  (written,  e.g„  for  Rt): 


yt=&Jyt+vt  (3) 

The  AR  coefficients  can  be  equivalently  expressed  in  terms  of  fre¬ 
quencies  vt  =  [i/i,t  v2,t  . . .  moduli 

Pt  =  [Pi,tP2,t  Pp^t]T<  and  real  poles  rt  =  [ri,tr2,<  ...  ty-,t]T, 
with  p  =  2 pc  +  pr.  The  poles  rk)i,  zk,t  =  pk,tej2”''k-t  and  its 
complex  conjugate  zk>t  are  the  roots  of  the  polynomial: 

1  -  autX  -  a2,tX 2  -  ...  -  ap,,Xp 

In  the  following,  the  transform  (vt,pt,rt)  — >  af  is  denoted  a,  = 
AR(vt ,  pt,  rt).  This  latter  formulation  enables  a  physically  sound 
model  of  the  time  evolution  of  the  AR  coefficients. 

3.1.  Bayesian  model 

A  simple  state  space  representation  is  given  by: 

xf  =  Ax(_i  4-  But  (4) 

yt  =  AR(x()Tyt  +  vt  (5) 

where  uf  is  a  zero-mean  Gaussian  white  noise  (referred  to  as  dy¬ 
namic  noise)  with  diagonal  covariance  matrix  Q.  Given  an  integer 
M  >  1,  the  state  vector  xt  is  M  x  p-dimensional  and  consists  of 
the  frequencies,  moduli  and  real  poles  from  time  t  —  M  4- 1  to  time 
t  as: 


_  [  T  T  T 

xt  —  y^t-M+l-.t  Pt-M+l:t  rt-M+l:t 

the  matrices  A  and  B  are  such  that,  e.g.. 


(6) 


v\,t 


M 


+  «i,t 


This  is  a  simple  smoothness  prior.  Rather  than  relying  on  the  AR 
coefficients,  the  model  defined  in  Eqs.  (4)  and  (5)  is  based  on  a 
modulus-frequency  representation,  which  is  more  convenient  for 
modelling  the  signal  evolution  in  time.  This  model  is  however 
highly  non-linear  due  to  the  transform  at  =  AR(xt).  The  esti¬ 
mate  Sit  of  x(  given  yi:l,  x0;/-i  requires  filters  adapted  to  nonlin¬ 
ear  Gaussian  models,  such  as  those  presented  in  previous  section. 


3.2.  Extended  Bayesian  model 

In  most  applications,  however,  the  hyperparameters  Q,  R  and  the 
model  order  p  are  not  known  a  priori.  It  is  still  possible  to  tune 
these  hyperparameters  “by  hand”,  but  they  may  not  be  constant 
w.r.t.  time:  for  example,  p  might  change  when  a  spectral  trajectory 
appears  or  disappears.  A  possible  solution  consists  of  extending 


log  {Rt)  =  log(fli-i)  +  ef  (12) 

where  ef  is  a  centered  Normal  noise  with  variance  8%.  Eq.  (7)  is 
a  bivariate  discrete  distribution  which  enables  five  possible  moves 
with  equal  probability:  (pf  =  pct_1  and  p[  =  prt_i),  (pi  =  V%-\ 
and  p[  -  +  1),  (pct  =  p(_!  and  prt  =  pi_x  -  1),  (pi  = 

Pt- i  +  1  and  pTt  =  p?_x)  and  (pi  =  pLt_1  -  1  and  prt  =  prt_x) 
corresponding  to  update,  birth  or  death  of  a  trajectory.  This  prior 
does  not  enable  trajectory  birth  (resp.  death)  when  the  total  num¬ 
ber  of  trajectories  -  i.e.,  the  TVAR  model  order  -  reaches  an  upper 
bound  (resp.  a  lower  bound). 


3.3.  Implementation 

The  algorithm  presented  in  Section  2  is  applied  to  the  model  de¬ 
scribed  above  (the  actual  state  vector  includes  the  frequencies,  the 
moduli  and  real  poles  from  time  t  -  M  + 1  to  time  t ,  the  logarithms 
of  the  diagonal  terms  of  the  matrix  Q(,  log (Rt)  and  the  orders  pi 
and  pTt).  For  the  sake  of  simplicity,  we  consider  constant  orders  pi 
and  pi  in  this  section.  The  proposal  distribution  for  [vt,  pt,rt\  is 
a  multivariate  normal  density  with  parameters  estimated  using  the 
unscented  Kalman  filter,  as  described  in  Section  2.  The  hyperpa¬ 
rameters  are  sampled  using  an  accept/reject  method,  according  to, 
e.g.,  for  Rt: 


p(Rt\Rt_uVt)  =. 


oc  exp  — ; 


r|  +  loga  +  M*L 


log  Rt- 1)2 


6% 


where  the  innovation  is  vt  =  yt  -  AR(xt)Tyt,  from  Eq.  (5). 
The  algorithm  is  initialized  using  the  frequencies,  moduli  and  real 
poles  computed  from  the  AR  coefficients  estimated  using  the  mod¬ 
ified  covariance  AR  estimator  applied  to  the  2 p  first  points  of  the 
signal. 


3.4.  Simulations 

In  this  section,  we  present  results  obtained  with  a  three  frequency 
components  signal  (RSB=24  dB).  The  model  orders  are  fixed  such 
that  pc  =  3  and  pr  =  1.  Figure  1  represents  the  spectrogram 
of  the  analysed  signal.  The  MMSE  estimates  of  the  frequencies, 
moduli  and  real  poles  at  each  time  instant  are  computed  using  the 
proposed  particle  filter,  with  N  =  500  particles,  and  M  =  8  (see 
Figure  2).  These  parameters  are  accurately  tracked,  in  spite  of 
the  signal  nonstationarity.  Figure  3  displays  the  hyperparameters 
estimates  over  time:  in  particular,  the  excitation  noise  variance  Rt 
is  stable,  which  confirms  the  accuracy  of  the  spectral  trajectories 
tracking. 
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4.  CONCLUSION 
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Fig.  1.  Spectrogram  of  the  processed  signal. 


Fig.  2.  TVAR  estimation  of  the  frequencies  (top),  moduli  (middle) 
and  real  pole  (bottom)  of  a  three-component  signal. 


Fig.  3.  Evolution  of  the  hyperparameters  over  time.  The  log10 
of  the  hyperparameters  tuning  the  frequencies  is  plotted  in  the  top 
row.  The  second  row  corresponds  to  the  moduli  hyperparameters, 
and  the  third  row  corresponds  to  the  real  pole,  log  1 0  ( lit )  is  plotted 
in  the  bottom  row. 


In  this  paper,  we  have  introduced  an  improved  auxiliary  particle 
filtering  method.  Its  implementation  requires  one  resampling  step, 
and  the  proposal  distribution  is  computed  using  a  numerical  ap¬ 
proximation.  Simulations  demonstrate  the  efficiency  of  both  the 
model  and  the  algorithm:  the  frequencies,  magnitudes  and  hyper- 
parameters  are  accurately  tracked.  In  particular,  we  would  like  to 
underline  the  robustness  of  the  method,  as  the  input  information 
from  the  user  is  minimum  with  our  approach.  The  full  implemen¬ 
tation  of  the  algorithm  is  still  under  development  at  the  time  of 
writing  this  paper,  but  results  will  be  reported  in  [11]. 

5.  REFERENCES 

[1]  B.D.O.  Anderson  and  J.B.  Moore,  Optimal  Filtering ,  Engle¬ 
wood  Cliffs,  1979. 

[2]  D.  Crisan,  P.  Del  Moral  and  T.  Lyons,  “Discrete  filtering  us¬ 
ing  branching  and  interacting  particle  systems”,  Markov  Proc. 
Rel.  Fields,  vol.  5.  293-318,  1999. 

[3]  A.  Doucet.  S.J.  Godsill  and  C.  Andrieu,  “On  sequential  Monte 
Carlo  sampling  methods  for  Bayesian  filtering”.  Statistics  & 
Computing,  vol.  10,  pp.  197-208,  2000. 

[4]  A.  Doucet,  J.F.G.  de  Freitas  and  N.J.  Gordon  (eds.).  Sequen¬ 
tial  Monte  Carlo  Methods  in  Practice.  New  York:  Springer- 
Verlag.  2001. 

[5]  N.J.  Gordon.  D.J.  Salmond  and  A.F.M.  Smith,  “Novel  ap¬ 
proach  to  nonlinear/non-Gaussian  Bayesian  state  estimation”, 
lEE-Proccedings-F,  vol.  140.  no.  2.  1993,  pp.  107-1 13. 

[6]  M.K.  Pitt  and  N.  Shephard,  “Filtering  via  simulation:auxiliary 
particle  filters”,  J.  Amer.  Stat.  Assoc.,  1999. 

[7]  S.J.  Julier.  “The  scaled  unscented  transformation”,  Automat- 
ica.  2000,  to  appear. 

[8]  E.A.  Wan  and  R.  van  der  Merwe,  “The  unscented  Kalman  fil¬ 
ter  for  nonlinear  estimation”,  in  Proc.  Conf  Adaptive  Systems 
for  Signal  Processing.  Communication  and  Control,  Canada, 
2000." 

[9]  N.  Ikoma  and  H.  Maeda.  “Nonstationary  spectral  peak  estima¬ 
tion  by  Monte  Carlo  Filter”,  in  Proc.  Conf.  Adaptive  Systems 
for  Signal  Processing,  Communication  and  Control,  Canada, 
2000. 

[10]  R.  Prado,  G.  Huerta  and  M.  West,  "  Bayesian  Time-Varying 
Autoregressions:  Theory,  Methods  and  Applications”,  To  ap¬ 
pear  in  special  issue  on  Time  Series  and  Related  Topics  of 
"Resenhas",  the  Journal  of  the  Institute  of  Mathematics  and 
Statistics  of  the  University  ofSao  Paolo. 

[11]  C.  Andrieu.  M.  Davy  and  A.  Doucet,  “Bayesian  on-line  esti¬ 
mation  of  TVAR  model  order”.  Technical  Report,  University 
of  Cambridge.  2001. 


312 


SPATIAL  AND  TIME-FREQUENCY  SIGNATURE  ESTIMATION  OF 

NONSTATIONARY  SOURCES 


Moeness  G.  Amin,  Weifeng  Mu,  and  Yimin  Zhang 


Department  of  Electrical  and  Computer  Engineering, 
Villanova  University,  Villanova,  PA  19085,  USA 
E-mail:  {moeness , weifeng , zhang}@ece . villanova .  edu 


ABSTRACT 

Signal  synthesis  using  time-frequency  distributions  can 
be  improved  using  an  antenna  array  receiver.  The  avail¬ 
ability  of  the  source  signals  at  different  array  elements 
allows  the  implementation  of  time-frequency  synthesis 
techniques  that  utilize  the  source  spatial  signatures  for 
crossterm  reduction  and  noise  mitigation.  In  this  paper, 
we  introduce  a  new  technique  for  signal  synthesis  based 
on  array  averaging  of  Wigner  distributions.  The  source 
temporal  waveforms  are  first  synthesized  and  then  used 
to  estimate  the  source  spatial  signatures.  Iterative  pro¬ 
cess  incorporating  the  source  signal  vector  and  array 
vector  can  be  applied  until  desired  results  are  reached. 

1.  INTRODUCTION 

Synthesizing  the  signal  from  the  Wigner- Ville  distri¬ 
bution  (WVD)  is  often  impeded  by  the  presence  of 
high  levels  of  noise  and  crossterms.  These  undesired 
terms  not  only  obscure  the  true  signal  power  localiza¬ 
tion  in  the  time-frequency  (t-f)  domain,  but  also  reduce 
the  synthesized  signal  quality.  Signal  synthesis  using 
time-frequency  distributions  (TFDs)  can  be  improved 
using  an  antenna  array  receiver.  The  availability  of 
the  source  signals  at  different  array  elements  allows  the 
implementation  of  t-f  synthesis  techniques  that  utilize 
the  source  spatial  signatures  for  crossterm  reduction 
and  noise  mitigation. 

In  [1],  the  WVDs  of  the  data  received  at  differ¬ 
ent  antennas  are  averaged  prior  to  synthesis.  It  is 
shown  that  spatial  averaging  of  WVD  decreases  the 
noise  levels,  reduces  the  interactions  of  the  source  sig¬ 
nals,  and  mitigates  the  crossterms.  As  such,  it  depicts 
enhanced  t-f  signatures  of  the  sources  incident  on  the 
multi-antenna  receiver. 

The  procedures  discussed  in  [1]  is  appropriate  to 
synthesize  the  signal  waveform  whose  t-f  signatures  are 

This  work  is  supported  by  the  Office  of  Naval  Research  under 
Grant  N00014-98-1-0176,  and  the  Air  Force  Research  Laboratory 
under  grant  no.  F30602-00-1-0515. 


distinct.  In  this  case,  the  masked  t-f  region  always  con¬ 
tains  the  autoterm  of  the  desired  source  signal  with  the 
influence  from  other  sources  often  negligible.  However, 
if  the  source  t-f  signatures  overlap,  the  mask  is  deemed 
to  capture  undesired  autoterms.  This  problem  cannot 
be  mitigated  by  spatial  averaging  of  TFDs  and  a  mod¬ 
ification  of  the  proposed  method  is  in  order. 

This  paper  discusses  an  iterative  process  that  incor¬ 
porates  both  the  estimated  source  signal  vector  and 
array  vector.  The  source  temporal  waveforms  are  first 
synthesized  and  then  used  to  estimate  the  source  spa¬ 
tial  signatures. 

The  paper  is  organized  as  follows.  A  review  of  the 
technique  proposed  in  [2,  3]  for  bilinear  signal  synthe¬ 
sis  is  given  in  Section  2.  In  Section  3,  we  introduce 
the  array  averaged  WVD  that  reduces  the  effect  of 
cross-terms  and  noise.  Section  4  discusses  the  itera¬ 
tive  synthesis  process  for  signals  with  overlapping  t-f 
signatures.  Section  5  presents  simulation  results. 

2.  SIGNAL  SYNTHESIS  BASED  ON  WVD 

The  signal  synthesis  techniques  based  on  WVDs  can  be 
found  in  [2,  3].  In  this  paper,  we  apply  the  method  of 
extended  discrete-time  Wigner  distribution  (EDTWD), 
introduced  in  [4].  The  EDTWD  for  a  received  data  of 
x(t)  is  defined  as 

Wxx(t,f)  =  £  x{t+\)x*(t-\)e~V«kf, 

k:t+%ez  2  2  (1) 

f  =  0,±i,±l,-.-, 

where  *  denotes  complex  conjugation,  t  and  /  represent 
the  time  index  and  the  frequency  index,  respectively. 
Equation  (1)  is  often  referred  to  as  the  auto  EDTWD 
of  the  signal  y(f).  Similarly,  the  cross  EDTWD  of  any 
two  signals  yi  (t)  and  y-2  ( t )  is  defined  as 

Wxx(t,f)=  Y, 

k-.t+^z  2  2  (2) 

t  =  0,±i±l,-.., 
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The  advantage  of  using  the  EDTWD  in  signal  synthesis 
lies  in  the  fact  that  it  does  not  require  a  priori  knowl¬ 
edge  of  the  source  waveform,  and  thereby  avoids  the 
problem  of  matching  the  two  “uncoupled”  even-indexed 
and  odd-indexed  vectors.  In  this  paper,  we  refer  to 
EDTWD  as  WVD  for  simplicity. 

The  overall  procedure  of  WVD-based  signal  synthe¬ 
sis  is  summarized  in  the  following  steps. 

1.  Place  an  appropriate  mask  on  Wxx(t,f)  such 
that  only  the  desired  signal  autoterms  are  retained. 

2.  Take  the  inverse  fast  Fourier  transform  (IFFT) 
of  Wxx(t,f) 

p(t,r)  =  I  Wxx(t,f)ej2nTfdf.  (3) 

3.  Construct  the  matrix  Q  =  [<7,  j]  with 

qi,j  =p  -i)  •  (4) 

4.  Take  the  Hermitian  component  Q//  of  Q 

Q H  =  I  [Q  +  Q"] ,  (5) 

where  the  superscript  H  denotes  transpose  conjuga¬ 
tion. 

5.  Apply  eigen-decomposition  to  the  matrix  Q n 
and  obtain  the  maximum  eigenvalue  Amax  and  the  asso¬ 
ciated  eigenvector  u.  The  synthesized  signal  is  given  by 

x  =  eJV^n,aXu,  (6) 

where  <f>  is  an  unknown  value  representing  the  phase. 

3.  THE  ARRAY  AVERAGED  WVD 


where  S(t)  is  the  kronecker  delta,  I  denotes  the  identity 
matrix,  and  A  =  [aj , . . . ,  a/,]  denotes  the  MxL  mixing 
mat  rix.  The  columns  of  matrix  A  are  the  source  spatial 
signatures  and  are  given  by 

a,  =  [a, a/A/]7’.  (9) 

We  assume  that  matrix  A  is  of  full  column  rank,  which 
implies  that  the  spatial  signatures  associated  with  the 
L  sources  are  linearly  independent.  To  simplify  the  dis¬ 
cussion,  we  exchange  any  possible  scalar  factor  embed¬ 
ded  in  a,  to  the  source  signal  and  assume  that  |  |a;- 1  |o  = 
AI .  It  is  obvious  that  this  exchange  does  not  affect  the 
data  observed  from  the  antenna  array. 

It  is  evident  that  when  L  >  1,  equation  (7)  repre¬ 
sents  a  multi-component  scenario  due  to  the  mixture  of 
the  signals  at  each  sensor.  Therefore,  a  quadratic  TFD 
at  the  individual  sensors  would  contain  not  only  the 
autoterms  of  all  source  signals,  but  also  the  interactions 
of  the  source  signals,  causing  undesirable  crossterms. 

For  the  purpose  of  subsequent  derivation,  we  first 
rewrite  the  noise-free  data  vector  in  (7)  as 

i. 

y{t)  =  As  (t)  =  ^a,s,(t),  (10) 

i=  1 

and  its  Arth  element  (i.e.,  the  data  received  at  sensor 
k,  k  =  1,  •  •  • ,  M)  is  given  by 

L 

Yk{t)  =  5Za/frSi(t).  (11) 

?=1 

Substituting  (11)  into  (1),  we  can  express  the  auto¬ 
sensor  WVD  of  the  signal  at  the  kt h  sensor,  y*  (t),  as 

L  L 

Wyty„  (f,  /)=££  (t,  /),  (12) 

«=i  i= i 


Assume  L  source  signals  incident  on  an  M-sensor  array. 
The  data  received  across  the  array  is  given  by  the  nar¬ 
rowband  model 

x(<)  =  y(t)  +  n(t)  =  As(f)  +  n(t),  t  =  (7) 

where  x(f)  =  [xi(t),  •••,  xM{t)]T  and  s (t)  =  [si(t), 

•  •  • ,  Si(t)]T  are  the  M  x  1  data  snapshot  vector  and 
the  Lxl  source  signal  vector  at  time  instant  t,  respec¬ 
tively,  where  we  assume  s,(f),i  =  1,---,L,  are  mono¬ 
component  signals.  In  (7),  the  superscript  T  denotes 
the  vector/matrix  transpose.  The  M  x  1  vector  n(t) 
is  the  noise  vector,  whose  elements  are  modeled  as  sta¬ 
tionary,  spatially  and  temporally  white  complex  Gaus¬ 
sian  processes  with  zero  mean  and  variance  of  o2 ,  i.e., 

E  [n(f  +  r)nH  (t.)]  =  ct2S(t)I  (8) 


where  WSiS .  (t,  /)  corresponds  to  the  auto-source  or  cross¬ 
source  WVD,  depending  on  whether  i  =  j,  or  i  j. 
Averaging  the  auto-sensor  WVDs  over  the  array  yields 

1  M 

W  («,/)  =  jtZWnKf) 

=  EE  |w«,((,/) 

i=l  j=  1  \  1=1  / 

L  L 

=  EEftiw,s,(U),  (i3) 

i~l  j— 1 

where 

1  M  1 

=  Jj  E  a'fraA  :  m  af a''  (14^ 

k= 1 
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is  defined  as  the  spatial  correlation  coefficient. 

Equation  (13)  shows  that  W (t,  f)  is  a  linear  com¬ 
bination  of  the  auto-source  and  cross-source  WVDs  of 
all  signal  arrivals.  Since 

I  Ail  <  M  #  j  and  Pij  =  1  ,i  =  j,  (15) 

the  constant  coefficients  in  (13)  for  the  auto-source 
WVDs  are  always  greater  than  those  for  the  cross¬ 
source  WVDs.  For  a  large  array  or  widely  separated 
sources,  \/3ij\  «  1.  This  property  is  utilized  by  the 
array  averaging  process  and  is  shown  to  improve  the 
signal  synthesis  performance. 

Specifically,  when  all  spatial  signatures  are  orthog¬ 
onal,  i.e.,  fiij  =  0  for  any  i  ^  j, 

L 

W(f,/)  =  £wSfc.»(t,/),  (16) 

k= 1 

which  is  solely  the  summation  of  the  source  signal  auto¬ 
terms.  The  above  equation  highlights  the  fact  that  all 
source  signal  crossterms  are  entirely  eliminated  from 
W (f, /)  and  only  the  autoterms  are  maintained,  which 
is  most  desirable  from  the  synthesis  perspective. 

4.  ITERATIVE  SYNTHESIS  PROCESS 

Assume  that  upon  implementing  the  synthesis  process 
described  in  section  2,  we  obtain  the  estimate  of  the 
mixing  matrix  A.  Since  there  are  interfering  signal 
autoterms  from  other  sources,  A  is  likely  to  be  different 
from  A.  We  use  A  to  construct  a  beamformer  applied 
to  the  data  received  across  the  array.  Assuming  a  noise- 
free  scenario, 

*(*)  =  =  bAHAs^'  (17) 

where  z(t)  =  [zi (t), . .  .,zL{t)\  is  a  L  x  1  vector.  Obvi¬ 
ously, 

Z*W  =  a*a*)  S*W  +  J2  (]^af a<)  si(t).  (18) 

It  is  expected  that  a*,  would  be  a  perturbed  version  of 
a  j,  with  the  approximations 

Jf**  »&  =  1  (19) 

and 

j^afa/  «|Afc|<l (20) 


From  (18)-(20),  the  WVD  of  zk(t)  is  given  by 

Wz,z  ,(<>/)  «  WStSfc(t,/) 

L  L 

+  E  E  A^**WSiSj(f,/).  (21) 
i=1  (j=i.i#») 

Clearly,  when  j  /  i,  Pik  fl*jk  <  1.  This  shows  that  in 
equation  (21),  except  for  the  fcth  auto-source  term,  all 
other  terms,  either  auto-  or  cross-source  terms,  are  sig¬ 
nificantly  reduced  in  W ZhZk(t,f).  In  the  case  of  ULA, 
the  suppression  of  those  terms  are  at  least  13dB  for 
large  value  of  M.  The  suppression  of  the  autoterms 
other  than  source  k  is  1/3**  |2 ,  which  is  more  than  26dB 
down  from  the  fcth  source.  Therefore,  the  effect  of  the 
overlapping  autoterms  from  other  sources  becomes  neg¬ 
ligible.  If  we  apply  the  steps  (3)-(8)  of  the  synthe¬ 
sis  procedures  of  Section  2  using  the  improved  WVD 
in  (21),  the  synthesized  signal  will  be  significantly  en¬ 
hanced,  as  shown  below. 

5.  SIMULATION  RESULTS 

In  this  section,  computer  simulations  are  provided  to 
demonstrate  the  performance  of  the  proposed  technique. 
We  consider  two  chirp  signals  with  overlapping  t-f  sig¬ 
natures  incident  on  an  eight-sensor  ULA  (M  =  8)  with 
inter-element  spacing  of  half- wavelength.  The  signals 
arrive  at  the  array  with  AO  As  of  -20°  and  20°,  with 
the  respective  start  and  end  frequencies  given  by  (0.77T, 
0.37t)  and  (0.37T,  0.77r),  respectively.  The  length  of  the 
signal  sequence  is  set  to  N  =  128.  There  is  no  additive 
noise  in  this  example. 

Fig.  1  shows  the  WVD  of  data  at  the  reference  sen¬ 
sor  #1.  The  two  signal  autoterms  overlap,  and  their 
cross  source  terms  could  also  be  clearly  noticed.  The 
array  averaged  WVD  is  plotted  in  Fig.  2.  Using  the 
conclusions  derived  in  Section  3,  we  expect  that  the 
cross-source  terms  would  be  suppressed  by  about  19dB 
after  the  array  averaging  process.  Indeed,  such  sup¬ 
pression  is  supported  by  the  plots  in  Fig.  2.  To  synthe¬ 
size  the  signal,  we  place  the  mask  along  each  t-f  signa¬ 
ture.  Any  reasonable  selection  of  the  mask  inevitably 
includes  components  from  the  other  source.  There¬ 
fore,  each  signal  synthesized  following  the  procedures 
described  in  Section  2  is,  in  essence,  corrupted  by  the 
other  signal.  Fig.  3(b)  depicts  the  WVD  of  one  synthe¬ 
sized  but  corrupted  waveform,  compared  to  the  WVD 
from  the  original  source,  which  is  shown  in  Fig.  3(a). 
By  implementing  the  beamformer  and  synthesis  proce¬ 
dures  proposed  in  Section  4,  we  obtain  less  noisy  wave¬ 
form.  The  WVD  of  the  improved  synthesized  signal 
is  shown  in  Fig.  3(c).  The  power  leakage  in  Fig.  3(b) 
almost  disappears  in  Fig.  3(c). 
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6.  CONCLUSION 


We  have  presented  an  iterative  synthesis  process  to  esti¬ 
mate  the  signals  with  overlapping  t-f  signatures  based 
on  the  averaging  of  the  Wigner-Ville  distributions  across 
an  antenna  array.  By  first  synthesizing  the  source  tem¬ 
poral  waveforms  and  then  using  the  results  to  estimate 
the  source  spatial  signatures,  the  problem  of  power 
leakage  that  may  occur  in  other  conventional  Wigner- 
Ville  based  synthesis  techniques  is  solved.  It  is  shown 
that  the  proposed  method  provides  clear  t-f  signature 
and  yields  improved  synthesis  performance. 
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Fig.  1.  WVD  of  the  two  overlapping  signals  at  a 
reference  sensor. 
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Fig.  3.  WVD  of:  (a)  original  signal  (top); 

(b)  synthesized  signal  from  array  averaging  (middle); 

(c)  synthesized  signal  from  iterative  process  (bottom). 
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ABSTRACT 

In  this  paper,  a  fast  algorithm  for  SLTF  analysis  and  SLTF 
synthesis  computations  is  presented.  The  proposed  algo¬ 
rithm  exploits  the  special  structure  of  the  SLTF  transforma¬ 
tion  matrix.  The  algorithm  requires  K  (2  log2  M  —  mul¬ 
tiplications  and  K  (4  log2  M  —  |)  additions  for  calculating 
the  biorthogonal  function  and  K  (log2  K  -  j  log2  N)  mul¬ 
tiplications  and  K  (2  log2  K  -  log2  N  -  3)  additions  for  both 
the  analysis  and  the  synthesis  transform  computations  where 
K  is  the  signal  length  and  M  and  N  are  arbitrary  numbers 
such  that  MN  =  K. 

1.  INTRODUCTION 

Time-frequency  (TF)  transforms  are  of  interest  in  many  ar¬ 
eas  due  to  their  natural  decomposition  of  a  signal  into  func¬ 
tions  localized  in  both  time  and  frequency.  Specifically, 
Gabor  transform  has  an  optimal  localization  in  the  TF  do¬ 
main  [1],  SLTF  transform  maintains  the  same  optimality  as 
Gabor  transform  [2],  Moreover,  it  overcomes  the  two  main 
problems  of  the  critically-sampled  TF  transforms:  stability 
and  localization  of  the  window  and  its  biorthogonal  func¬ 
tion  [2],  Compared  to  other  TF  transforms,  SLTF  has  sev¬ 
eral  advantages  that  makes  it  suitable  for  many  applications. 
First,  it  is  a  linear  critically-sampled  transform.  This  simpli¬ 
fies  the  synthesis  transform  procedure  (from  the  TF  domain 
to  the  original  domain)  after  filtering.  This  is  in  contrast  to 
bilinear  transforms  where  difficulties  are  encountered  in  re¬ 
trieving  the  signal  from  the  TF  domain,  or  the  over-sampled 
transforms  where  iterative  methods  are  needed  for  the  syn¬ 
thesis  transform  [3].  Compared  to  other  linear  critically- 
sampled  TF  transforms,  this  transform  has  two  major  ad¬ 
vantages:  stability  and  localization  of  both  the  window  and 
its  biorthogonal  function  [2], 

The  direct  computations  of  the  SLTF  transform,  how¬ 
ever,  require  0{K 3)  operations  for  the  biorthogonal  func¬ 
tion  computations  and  0(I<2)  for  the  analysis  and  the  syn¬ 
thesis  transform  computations  where  K  is  the  signal  length. 

The  author  would  like  to  acknowledge  the  support  of  King  Fahd  Uni¬ 
versity  of  Petroleum  and  Minerals. 


For  Gabor  transform,  several  algorithms  have  been  de¬ 
veloped  for  fast  computations  with  results  in  the  range  0{I<2) 
to  0{I<  log2/L)  [4-6].  In  this  paper,  a  fast  algorithm  for 
SLTF  transform  computations  is  derived  which  drastically 
reduces  the  computation  requirements  to  0{K  log2  M)  where 
M  «  K  for  both  the  analysis  and  synthesis  transforms. 
The  proposed  algorithm  is  developed  via  a  matrix  approach 
by  exploiting  the  special  structure  of  the  SLTF  transforma¬ 
tion  matrix. 

2.  SLTF  TRANSFORM 

The  SLTF  analysis  transform  is  defined  for  a  finite  extent 
discrete  signal  x  (k) ,  for  0  <  k  <  K,  as  [2]: 

K-1 

=  X>  (*)  7,*„  <*)  csin  (ia) 

fc  =  0 

and  the  synthesis  transform  is  defined  as: 

M- 1 N- 1 

*(*)  =EE  VA.W  csin  (ib) 

m= 0  n=  0 

where  M  and  N  are  the  number  of  analysis  samples  in  time 
and  frequency,  respectively  {MN  =  K),  am<n  are  the  SLTF 
transform  coefficients,  csin  stands  for  cos  for  even  m  and 
sin  for  odd  m,  and: 


is  the  normalized  discrete  Gaussian  window  shifted  to  the 
center  of  the  mth  window  with  6  controlling  the  window 
width,  and  7m(fc)  =  j{k—mN)  where  7 (fc)  is  the  biorthog¬ 
onal  function  to  h  (/c),  i.e.,  it  satisfies  the  condition: 

E  hm  (t)  7  M  csin  1  Mi  1 L  =  ljm 

k=0  1 

The  analysis  transform  (la)  and  the  synthesis  transform 
(lb)  can  be  rewritten  in  matrix  form  as  [2]: 

a  =  EH-1  x  (analysis  equation)  (2a) 

x  =  H  E  a  (synthesis  equation)  (2b) 


hm{k)  ,5  2  exp 


fc  —  mN  — 
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T 

where  a  ,  [«o.o-  ^o.  i  -  -  -  •  ?  ^o.  a^— i ,  .0-  •  -  •  -  ^i\/— 1 .  Ar— i  ]  . 

X  ,  [x  (0)  ,x  (1) _ ,x  (K  -  1)]r,  and  E  is  an  I<  x  I< 

block-diagonal  matrix  which  is  given  by: 

E  ,  diag  (C',  S,  -C,  -.S',  —  C\  S)  (3) 

where  C  =  [cn<k\NxN  is  the  A’-point  DCT-IV  transform 
matrix  with: 


cn,k  — 


’r(K  +  ?)(/‘  +  ?) 
N 


and  S  = 
trix  with: 


[•Sn./.]ArxAr  *s  7V-point  DST-IV  transform  ma- 


sn.k  = 


”(,t+?)(fc+4) 

N 


The  KxK  matrix  H  provides  the  desired  windowing  effect, 
i.e.,  localization  in  the  time  domain.  H  is  given  by: 


H  = 


H0 

HiJ 


#A/-1 

H0J 


H-\ 

H2J 


Hm—1  J  Hm^zJ  •••  H0J 
where  J  is  the  N  x  N  row  exchange  matrix: 


J  = 


/  0  .. 

•  0 

1  \ 

0  •• 

•  i 

0 

\i  ’• 

•  0 

0  ) 

NxN 


(4) 


(5) 


and  H,n  is  an  N  x  N  diagonal  matrix  defined  by: 

Hm  =  (-IjL12^1]  diag  (hm  (0) , . . ,,hm  (iV  - 1))  (6) 


with  \x\  being  the  integer  part  of  x. 

The  direct  computation  of  the  analysis  transform  (2a)  in¬ 
volves  an  inversion  of  the  K  x  I<  matrix  H  which  requires 
C>(A'3)  operations  and  a  multiplication  of  the  EH-1  by  x 
which  requires  0(K2)  operations.  Also,  direct  computation 
of  the  synthesis  transform  coefficients  (2b)  requires  0(K2) 
operations.  In  the  following  sections,  it  will  be  shown  how 
to  perform  the  transform  computations  with  increased  effi¬ 
ciency  by  exploiting  the  special  structure  of  H  and  E. 


3.  CALCULATION  OF  BIORTHOGONAL 
FUNCTION 

From  (la)  and  (2a),  the  diagonal  or  anti-diagonal  elements 
from  each  block  in  the  mth  block-row  of  H-1  comprise  the 
biorthogonal  function  7  *n(fc). 

From  (4),  (5),  and  (6),  H  is  block  matrix  with  NxN 
blocks.  These  blocks  are  either  diagonal  blocks  \Hm]  or 


anti-diagonal  blocks  [HmJ],  Therefore,  H  has  non-zero  el¬ 
ements  only  at  the  block’s  diagonal  for  even  block-column 
indices  or  the  block’s  anti-diagonal  for  odd  block-column 
indices,  i.e.,  H  has  the  structure: 


where  the  solid  lines  indicate  the  only  non-zero  elements  in 
H.  The  matrix  H  can  be  transformed  to  a  block  diagonal 
matrix  with  block  circulant  blocks  using  row  and  column 
permutations  as  follows.  Define  the  permutation  matrices1 
Pi  and  P2  whose  encoding  vectors  and  p2  are  given  by: 


Pi  (k) 

P2(k) 


k 

Ti 


+  N  ( k  mod  M) 


l(N-  1) +  (—!)' 


k_ 

M 


N  ( k  mod  M) 


for  k  =  0  :  I<  —  1 ,  where  /  =  k  mod  2  and  y  mod  x  is  the 
remainder  of  y/x. 

The  matrix  P2HP^  is  a  block  diagonal  matrix  given 

by; 

P2HP^  =  diag(D0.Di,...,DN-i)  (7) 

where  D„  is  an  M  x  M  block-circulant  matrix  with  each 
block  being  a  2  x  2  matrix  Dn,m 


(  T>„.o 

Dn.^~  1  •• 

•  Dn.  1  \ 

Ai.i 

Ai.o 

•  At, 2 

(8) 

Ai.M-2  " 

•  At, o  ) 

can  be  inverted  separately.  Therefore,  H  1  is 

given  by2: 

H-1  =  Pfdiag(Do\DT\....D^_1)P2  (9) 


Also,  each  D„  can  be  converted  to  block-diagonal  matrix 

by: 

Bn  =  ( ' E m  ®  •  E)„  •  {Em  ®  /2^  (10) 

=  diag  {^Bn, o> 7?„,i ,  •  •  • , ^ 

1  The  permutation  matrix  P  is  the  identity  matrix  with  rows  reordered 
17].  The  permutation  matrix  is  represented  by  the  encoding  vector  p  whose 
element  p(k)  is  the  column  index  of  the  sole  “1"  in  the  kth  row  of  P. 
2Permutation  matrices  are  orthogonal,  i.e.,  P-1  =  PT  . 
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where  ®  is  the  Kronecker  tensor  product,  /2  is  the  2x2  iden¬ 
tity  matrix,  and  Em.  =  [em./,]MxM  is  the  y -point  discrete 


Fourier  transform  matrix  with  e. 


.  exn _ 

'  KXH  M/2 


and  Bn,m  is  a  2  x  2  matrix.  Equation  (10)  is  a  simplified 
form  of  Theorem  5.6.4  of  [8].  Since  Bn  is  a  block-diagonal 
matrix  with  2x2  blocks,  its  inversion  reduces  to  y  inver¬ 
sions  of  2  x  2  matrices  Bn,m •  From  (10),  D^1  is  given  by: 

D-1  =  (£m  ®  I2y T  ■  B-1  •  (£m  ®  /2)  (1 1) 

Instead  of  using  (11)  to  calculate  D“\  the  2x  2  matrices 
B,h,n  can  be  directly  calculated  using: 


=  f  0 12 ) 


r  1  nnn  C 


and  Dn  1  can  be  directly  obtained  as: 


D7  = 


Ai,o 

&n,  M_i  • 

•  •  Ai.i 

Ai.i 

Dn,o 

•  •  Ai,2 

Dn,t$- 1 

2  • 

•  •  Ai.O 

where: 


=  (£m  ®/2) 


'nM- 1 


3n,M-1 


The  H-1  computation  can  be  further  simplified  as  follows: 

Firstly,  to  save  time  and  memory  requirements,  instead 
of  establishing  the  KxK  matrix  P2  H  Pi ,  Dn>m  required  in 
(12)  can  be  obtained  directly  for  any  window  function  h  (k) 
by: 


^2 m(n)  h(2m— 1)  modJW  in) 

k'lm  1 1  (rrl)  ^2m  (rrl) 


for  m  -  0, . . . ,  M  —  1 .  Using  P3  leads  to: 

p3(^®/2)pr=f£,f  E  1  in) 

~T 

Thus,  (12)  can  be  written  as 


Similarly,  (14)  can  be  written  as 


Note  that  multiplication  by  the  permutation  matrices  Pi, 
P2  and  P3  represents  only  a  change  of  row  or  column  in¬ 
dices  and  multiplication  by  Em  represents  taking  the  Ap¬ 
point  FFT.  Thus,  using  (15),  (S),  (19),  and  (13),  calculat¬ 
ing  Dr1  reduces  to  four  times  the  y  -point  FFT  operation 
to  calculate  B,hm,  y  times  the  inversion  of  a  2  x  2  matrix 
Bn,m,  and  four  times  they -point  inverse  FFT  operation  to 
calculate  D'n  m.  The  M-point  FFT  takes  y  log2  M  mul¬ 
tiplications  and  M  log2  M  additions  and  the  inversion  of  a 
2x2  matrix  takes  ^multiplications  and  5  additions,  and  the 
whole  inversion  process  required  to  calculate  D”1  takes: 

2  x  4  log2  f )  +  12  f  =  M  (2  log2  M  +  5)  multi¬ 
plications  and  M  (4  log2  M  -  §)  additions  and  the  whole 
inversion  process  of  H  takes  K  (2  log2  M  +  5)  multiplica¬ 
tions  and  K  (4  log2  M  -  §)  additions. 

4.  SLTF  ANALYSIS  TRANSFORM 
COMPUTATIONS 

The  computations  of  the  SLTF  transform  coefficients  am,n 
can  be  reduced  as  follows: 

Substituting  (9)  in  (2a)  leads  to: 

a  =  E  Pf  diag(Do1,D71,...,D^1_i)P2x  (20) 


for  m  =  0, 1 , . . . ,  y  —  1  and  n  =  0, 1 , . . . ,  A"  —  1  where 
nl  =  N  —  1  —  n. 

Secondly,  the  MxM  matrix  (Em  ®  /2 j  can  be  con¬ 
verted  to  a  block-diagonal  matrix  with  y  x  y  blocks  using 
the  permutation  matrix  P3  whose  encoding  vector  is  given 
by: 

Pz(m)  =  +2^mmod40  (16) 


From  (11),  each  D^1  can  be  replaced  by: 

0  /2  j  B"1  (Em  ®  J2j  .  Substituting  (17)  in  (11) 

gives: 


P3 


where  B"1 
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Therefore,  (20)  can  be  reduced  to: 

a  =  EP*PjETpAx 

diag  (j30  o, . .  •  'Bq  m_-|5-Bi.o —  x 

PJEiPaP2x  (21) 

where  P4  =  diag  (P3,  P3,  •  •  • ,  Pz)mNxmn  »  and 
Ei  =  diag 

3  V  2  2  2  ) MNxMN 

Thus  equation  (21)  includes: 

•  27V  times  taking  the  4f-point  FFT  of  the  vector  P4  P2  X 
which  requires  EM.  log2  M  -  NAI  multiplications 
and  NAI  log2  M  —  2NM  additions; 

•  Ar-y  times  the  multiplication  of  an  2x2  matrix  B~]n 
by  an  2  x  1  vector  which  requires  2NAI  multiplica¬ 
tions  and  NAI  additions; 

•  2 N  times  taking  the  -point  inverse  FFT  which  re¬ 
quires  BM.  |og2  M  —  NAI  multiplications  and 
7V7\/log2  AI  —  27V M  additions; 

•  A I  times  taking  the  7V-point  DCT-IV  or  DST-IV  trans¬ 
forms.  Assuming  that  the  TV-point  DCT-IV  or  DST- 
IV  transform  requires  y  log2  TV  multiplications  and 
TV  log2  TV  additions,  this  operation  requires  EM.  |og2  N 
multiplications  and  NAI  log2  TV  additions. 

Thus,  calculating  the  transform  coefficients  «m.„  requires 
TV M  log2  AI+  EM  log2  TV  =  K  (log2  K  -  \  log2  TV)  mul¬ 
tiplications  and  I<  (2  log2  K  —  log2  TV  —  3)  additions,  i.e 
less  than  0{I<  log2  K). 

5.  SLTF  SYNTHESIS  TRANSFORM 
COMPUTATIONS 

The  computations  of  the  synthesized  signal  x  (k)  can  be  re¬ 
duced  as  follows: 

Substituting  (7)  in  (2b)  leads  to: 

x  =  P^diag(D0,  D1,...,DAr_i)P2Ea  (22) 

Each  Dn  can  be  replaced  by: 


E  M 

'  El 

D„  =  Pf 

Em. 

P3B„Pr 

2  Et 

EE  m 

2 

2  . 

where  Bn  =  diag  (Bn$,BnA,...,Bn  m_-,)  •  Therefore, 
(22)  can  be  reduced  to 

X  =  Pf  P^Ei  P4  x 

diag  •  •  •  ,Bq  . . . ,BN_^ ,m-i^  x 

Pj  E^  P4P2  Ea  (23) 


Thus  equation  (23)  includes 

•  A I  times  taking  the  Appoint  DCT-IV  or  DST-IV  trans¬ 
forms; 

•  2N  times  taking  the  -point  inverse  FFT; 

•  N^-  times  multiplication  of  an  2x2  matrix  by  an  2x1 

vector; 

•  2N  times  taking  the  4r -point  FFT. 

Thus,  calculating  the  synthesized  signal  x  ( k )  requires: 
K  (log2 1\  —  5  log2  TV)  multiplications  and 
I\  (2  log2 1\  -  log2  N  -  3)  additions. 

6.  CONCLUSION 

In  conclusion,  by  exploiting  the  special  structure  of  the  SLTF 
transformation  matrix  EH-1 ,  a  very  fast  algorithm  for  SLTF 
computations  is  derived.  The  proposed  algorithm  reduces 
the  complexity  to  the  of  order  0{K  log2  AI)  instead  of  the 
order  0{K2)  for  the  transform  coefficients  computations 
and  to  the  order  £>(A'log2  AI)  instead  of  the  order  0(K3) 
for  the  biorthogonal  function  computations 
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ABSTRACT 

A  fractional-Fourier-domain  realization  of  the  weighted  Wi- 
gner  distribution  (or  S-method),  producing  auto-terms  close 
to  the  ones  in  the  Wigner  distribution  itself,  but  with  re¬ 
duced  cross-terms,  is  presented.  The  computational  cost  of 
this  fractional-domain  realization  is  the  same  as  the  com¬ 
putational  cost  of  the  realizations  in  the  time  or  the  fre¬ 
quency  domain,  since  the  short-time  Fourier  transform  of 
the  fractional  Fourier  transform  of  a  signal  corresponds  to 
the  short-time  Fourier  transform  of  the  signal  itself,  with  the 
window  being  the  fractional  Fourier  transform  of  the  initial 
one.  The  appropriate  fractional  domain  is  found  from  the 
analysis  of  the  second-order  fractional  Fourier  transform  mo¬ 
ments.  Numerical  simulations  show  a  qualitative  advantage 
in  the  time-frequency  representation,  when  the  calculation 
is  done  in  the  optimal  fractional  domain. 

1.  INTRODUCTION 

Different  types  of  joint  time-frequency  distributions  (TFDs) 
are  nowadays  used  in  signal  processing  in  order  to  extract 
the  characteristic  behavior  of  a  signal.  The  advantages  and 
disadvantages  of  most  joint  representations  are  well  known. 
Thus,  for  example,  the  appropriate  short-time  Fourier  trans¬ 
form  (STFT)  of  multicomponent  signals,  although  not  stress¬ 
ing  the  auto- terms  very  well,  is  free  from  cross-terms,  if 
the  components  do  not  overlap.  On  the  other  hand,  the 
Wigner  distribution  (WD)  of  such  signals  is  highly  concen¬ 
trated,  but  suffers  from  cross-terms,  which  may  hide  some  of 
the  auto-terms.  Various  distributions,  belonging  to  Cohen’s 
class  of  TFDs,  are  defined  in  order  to  try  to  find  the  opti¬ 
mal  representation  that  will  significantly  reduce  the  cross¬ 
terms,  without  significantly  changing  the  auto-terms.  Since 
the  commonly  used  Cohen  class  TFDs  -  such  as,  for  exam¬ 
ple,  the  Choi-Willams,  Bertrand,  Butterworth,  and  Born- 
Jordan  distributions  -  have  been  designed  for  a  general  sig¬ 
nal,  they  do  not  correspond  to  the  optimal  representation  of 
a  particular  signal.  In  order  to  construct  an  optimal  TFD, 
the  distribution  kernel  should  be  adapted  to  the  given  sig¬ 
nal  type  [l]-[5].  Moreover,  the  adaptation  has  to  be  made 
fast  and  with  minimum  knowledge  about  the  signal  to  be 
analyzed. 

It  was  shown  in  [6]-[10]  that  the  weighted  pseudo  WD,  or 
the  S-method  (SWWD),  of  a  multicomponent  signal  leads 
to  a  representation  with  significantly  reduced  cross-terms, 
while  the  auto-terms  are  close  to  or  exactly  the  same  as 
the  ones  in  the  pseudo  WD.  This  signal  representation  is 
based  on  the  STFT,  and  two  different  forms  of  it  have  been 
proposed  [7,  10].  One  of  them  combines  the  values  of  the 
STFT  along  the  frequency  axis  for  a  given  time  instant, 
while  the  other  is  based  on  calculation  in  the  time  direction 


for  a  given  frequency  value.  The  cross-term  reduction  and 
the  efficiency  of  convergence  towards  the  WD  auto-terms 
depend  on  the  orientation  of  the  auto-terms  in  the  time- 
frequency  plane.  Thus,  if  the  auto-terms  are  oriented  in 
parallel  to  the  time  (or  frequency)  axis,  then  the  STFT- 
based  calculations  have  to  be  applied  in  the  frequency  (or 
time)  domain,  correspondingly.  In  the  more  general  case, 
however,  the  auto-terms  may  lie  in  some  region  that  might 
be  oriented  in  a  skew  direction  in  the  time-frequency  plane. 

In  this  paper  we  introduce  the  SWWD  in  the  fractional 
(mixed  time-frequency)  domain.  To  this  aim,  the  STFT  of 
the  fractional  FT  of  the  signal  has  to  be  calculated.  We 
will  see  that  the  STFT  of  a  signal’s  fractional  FT  with  a 
given  window,  corresponds  to  the  STFT  of  the  signal  itself 
with  the  window  being  the  fractional  FT  of  the  original  one, 
combined  with  a  rotation  of  the  coordinate  system.  The 
STFT  in  the  most  appropriate  fractional  domain  can  thus  be 
performed  without  significantly  more  computational  costs. 
After  we  get  the  STFT  in  the  optimal  fractional  domain, 
the  standard,  very  simple  SWWD  calculation  is  performed 
in  that  domain.  As  a  result,  we  obtain  a  distribution  that 
preserves  the  WD  auto-terms  and  reduces  the  cross-terms  at 
the  same  time.  The  standard  time-  or  frequency-direction 
realizations  [6]-[10]  follow  as  special  cases. 

In  order  to  find  the  fractional  domain  in  which  a  signal  is 
represented  in  the  simplest  way  and  which  matches  best  to 
the  chosen  model,  and  to  find  the  corresponding  STFT  win¬ 
dow,  the  analysis  of  fractional  FT  moments  is  applied.  In 
particular,  we  suppose  that  an  optimal  SWWD  calculation 
direction  corresponds  to  minimal  signal  width,  i.e.,  mini¬ 
mal  fractional  second-order  moment.  Determination  of  this 
moment  can  be  done  analytically,  based  on  three  known  mo¬ 
ments  for  three  different  directions.  The  proposed  approach 
is  demonstrated  on  examples. 

2.  STFT  IN  THE  FRACTIONAL  FT  DOMAIN 

The  STFT  was  originally  introduced  for  better  time  local¬ 
ization  of  frequency  components  of  a  signal  f(x),  by  using  a 
suitable,  commonly  real-valued,  window  g{x)\ 

/OO 

f(t  +  x)g*(x)exp(—j2-Kxu)dx.  (1) 

-OO 

Clearly,  for  analyzing  a  signal  with  a  constant  frequency 
content,  one  needs  a  wide  window,  while  for  the  analysis  of 
pulse-like  signals,  a  narrow  window  has  to  be  applied.  This 
rule  also  holds  for  the  analysis  of  very  wide-spread  and  very 
narrow  signals,  respectively.  So  we  can  adjust  the  window, 
if  the  signal  width  is  known.  Suppose  now  that  the  minimal 
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signal  width  does  not  correspond  to  the  time  or  the  fre¬ 
quency  domain.  Then  an  affine  transformation  of  the  phase 
plane  could  lead  to  an  optimal  (for  example,  minimal-width) 
signal  representation.  In  this  paper  we  restrict  ourselves  to 
a  rotation  of  the  coordinate  system. 

To  represent  a  signal  in  the  new  coordinate  system,  we 
use  the  fact  that  a  rotation  of  the  time-frequency  plane  cor¬ 
responds  to  a  fractional  FT  of  f(x).  The  fractional  FT  of  a 
signal  /( x)  can  be  defined  as  [11,  12] 


R](u) 


-L 


K(a,x,  u)f(x)dx, 


(2) 


where  the  kernel  K(a,x,u)  is  given  by 


s  expO-j)  r.  (x2  +  u2)  cos q  —  2ux ,  ,ON 

K(a,  x,  u)  =  7===  exp  j7r - : -  •  (3) 

s/j  sin  a  sin  a 


Note  that,,  in  particular,  R°f{u)  =  /(«),  /?/(«)  =  and 

that  Rj/2(u)  corresponds  to  the  normal  FT  of  f(x). 

Let  us  consider  the  fractional  STFT  STf(t,uj),  defined 
as  the  STFT  of  the  fractional  FT  Raf(x )  of  the  signal  f(x) 


POO 

ST?(t,u>)=  /  R'f(t  +  x)g*(x)exp(-j2nx*>)dx.  (4) 


From  the  symmetry  property  R°-(u)  =  [/?”“(«)]*  of  the 
fractional  FT,  and  from  the  relations  between  the  fractional 
FT  and  the  STFT  as  given  in  [11,  Section  IV],  we  have  the 
relationship 

/OO 

RCf{x)g*{t  -  X.)  exp(—  j2nx.uj)dx  (5) 

-OO 

POO 

=  exp(jTTUv)  /  f(x)\Rga(u  -  x)}*  exp(-j2irxv)dx, 

J  —  OO 

with  u  —  t  cos  a  —  u)  sin  a,  v  =  t  sin  a  +  wcoso.  We  thus  con¬ 
clude  that  calculating  the  STFT  of  the  signal's  fractional 
FT  Rf(x)  with  the  window  g(x)  [cf.  (5)],  is  the  same  as  cal¬ 
culating  the  STFT  of  the  signal  /  ( x )  itself  with  the  window 
R~a(x),  combined  with  a  rotation  of  the  coordinate  system. 
Since  R~a(x),  which  is  the  (inverse)  fractional  FT  of  the 
window  g(x),  can  be  calculated  for  all  possible  angles  and 
stored  in  a  computer  memory,  this  implies  that  calculation 
of  the  fractional  STFT  STf(t,u)  will  not  be  significantly 
more  demanding  in  numerical  complexity  than  calculation 
of  the  standard  STFT  ST°(t,iu). 


3.  FRACTIONAL  FT  MOMENTS 

It  is  known  that  the  signal  width  in  the  time  or  the  frequency 
domain  can  be  estimated  from  its  second-order  central  mo¬ 
ments.  Analogously,  the  signal  width  in  a  fractional  domain 
is  related  to  its  second-order  central  fractional  FT  moments 
[13]. 

The  normalized  second-order  central  fractional  FT  mo¬ 
ment  pa  is  defined  by 

1  f  i  i-»a /  m2/  \2  i  (mQ  n?a)  ( 

Pa=E  I R/(x)\  {x-ma)  dx  = - - - ,  (6) 

J  —OO 

where  the  zero-order  moment  E  —  \RJ(x)\2dx  repre¬ 
sents  the  signal’s  energy  (which,  in  accordance  with  Parse- 
val’s  theorem  for  a  unitary  transformation,  does  not  depend 


on  a);  where  the  first-order  moment  ma  =  \Rj(x)\2xdx 
is  related  to  the  center  of  gravity  of  the  fractional  power 
spectrum;  and  where  wa  =  |iij (x)\2x?dx  is  the  second- 
order  moment .  The  first-order  moment  m„  in  the  fractional 
o-domain  can  be  calculated  from  the  relationship 

ma  =  mo  cosa  +  sin  a,  (7) 

where  m0  and  2  are  the  first-order  moments  in  the  time 
and  the  frequency  domain,  respectively.  Meanwhile,  any 
second-order  moment  wa  can  be  obtained  from  three  others 
uh 3,  w-, ,  and  wtl.  say,  if  the  angles  (3,  7,  and  /t  are  different, 
and  the  difference  between  them  is  not  equal  to  7r  [13],  Let 
us  choose  three  second-order  moments:  wq,  'irV/2.  and  1/7/4  ■ 
Then  using  the  results  from  [13],  we  have: 

Wa  =  wo  cos2  a  +  U7/2  sin2  Q  +  \wn/ 4  —  4(wo  + 117/2)]  sin  2a. 

(8) 

Taking  into  account  Eqs.  (6),  (7),  and  (8),  we  conclude  that 
three  fractional  FT  power  spectra  define  all  normalized  cen¬ 
tral  second-order  moments  p„,  which  characterize  the  signal 
widths  in  the  corresponding  fractional  domains: 

pa  E  =  po  cos2  a  +  77/2  sin2  a 

+  [w^/4  -  mom-/ 2  -  2(Wo  +  U7/2)]  sin  2a.  (9) 

In  order  to  find  the  fractional  domain  where  the  signal 
has  an  extremal  (minimum  or  maximum)  width,  we  study 
the  behavior  of  the  derivatives  of  pa.  It  is  easy  to  see  from 
Eq.  (9)  that  the  first  derivative  of  pa  equals  zero  for  those 
angles  ac  for  which 

^  2(u7/4  -iriom.,/2)  -  (w0  +117/2)  /inN 

tan2ae  =  - •  fib) 

P 0  -  77/2 

Since  the  fractional  FT  is  periodic  in  n  with  period  27r 
and  satisfies  the  half-period  relation  R^+^(x)  =  R'J(-x), 
the  signal  width  takes  a  minimum  and  a  maximum  value 
once  over  the  region  a  6  [0, 7r).  From  the  behavior  of  the 
second  derivative  of  77  for  a  =  ae,  Ed2pa/do 2  |Q=a,.  = 
2(p,r/2  —  po)/ cos  2q> ,  we  conclude  that  the  signal  reaches  its 
minimum  width  for  that  value  ae  for  which  cos  2a,.  has  the 
same  sign  as  77/2  -  po;  the  other  value  of  a,-  in  the  interval 
[0, 7r)  then  corresponds  to  the  maximum  width.  Thus,  the 
appropriate  fractional  domain  where  the  signal  is  the  best 
concentrated  or  most  widely  spread,  can  be  found  from  the 
knowledge  of  only  three  fractional  power  spectra. 


4.  WEIGHTED  WIGNER  DISTRIBUTION 
(S-METHOD) 

In  the  previous  section  we  have  discussed  how  to  find  the 
fractional  angle  corresponding  to  the  minimal  or  the  max¬ 
imal  spread  of  the  signal.  In  this  section  we  discuss  the 
rotated  version  of  the  weighted  WD  and  use  the  knowledge 
of  the  optimal  fractional  angle  to  find  its  optimal  realiza¬ 
tion.  The  SWWD  has  been  introduced  for  the  analysis  of 
multicomponent  signals,  with  the  aim  to  produce  a  repre¬ 
sentation  close  to  the  sum  of  the  WDs  of  each  component 
separately,  but  with  reduced  (or  even  without)  cross-terms. 

Consider  a  multicomponent  signal  f(x)  —  fi(x). 

Its  pseudo  WD  defined  by 


PWDf{t,  w) 


f{t  +  \x)f*{t  -  \x) 


xS,(2;r).9*  (  —  \x)  exp(—  j2irxw)dx  (11) 
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has  the  following  form:  PWDf(t,u>)  =  ’%^L1PWDfi(t,io)+ 

T,iLiT,tLi,kfiPWDfi(t’u})PWDf,{t,uj).  In  most  appli¬ 
cations,  the  aim  is  to  get  a  distribution  that  contains  only 
the  auto-terms,  PWDf(t,u)  =  PW Df- (t,u>).  It  is 

also  known  that  the  WD,  among  all  other  quadratic  signal- 
independent,  time-frequency  distributions,  has  the  best  auto¬ 
term  concentration.  In  most  cases  the  reduction  of  cross¬ 
terms  is  obtained  at  the  cost  of  auto-terms  degradation. 


Figure  1:  a)  Wigner  distribution  of  the  signal,  b)  The 
SWWD  calculated  in  the  frequency  direction,  c)  Rotated 
Wigner  distribution  (Wigner  distribution  of  the  fractional 
Fourier  transform),  d)  The  SWWD  calculated  in  the  “opti¬ 
mal”  fractional  frequency  direction. 


where  z(9)  is  a  weighting  function  used  to  exclude  the  in¬ 
terference  pattern  between  frequency-misaligned  versions, 
while  it  should  be  wide  enough  to  provide  complete  inte¬ 
gration  over  the  auto-terms  of  the  STFT  ST°(t,u>).  It  is 
easy  to  see  that  if  z(8)  =  1  we  get  the  pseudo  WD  (11), 
whereas  for  z{8)  —  8(9)  we  obtain  the  time-varying  spectro¬ 
gram.  If  the  width  of  z(6)  is  somewhere  in  between,  we  can 
expect  -  as  it  was  proved  in  [6]  and  [10]  -  that  the  corre¬ 
sponding  distribution  combines  the  nice  properties  of  both 
the  spectrogram  and  the  WD.  It  is  known  that,  unlike  the 
WD,  the  spectrogram  does  not  suffer  from  cross-terms.  On 
the  other  hand,  the  spectrogram  has  a  significant  leakage 
due  to  a  window  usage,  which  is  less  exhibited  in  the  case 
of  the  WD.  By  choosing  an  appropriate  function  z(6 ),  the 
sharpness  of  the  WD  can  be  preserved  while  the  cross-terms 
will  be  reduced  or  even  completely  removed.  In  this  case 
the  time  window  has  to  be  such  that  the  components  of  the 
STFT  are  not  far  from  the  instantaneous  frequencies  of  the 
signal  components,  in  order  to  obtain  fast  convergence  inside 
z(9). 

The  SWWD  can  also  be  calculated  based  on  time-direc¬ 
tion  combined  STFTs.  It  is  then  of  the  form  [7,  10] 


ST°(t  +  9,  uj)z(0)  exp(jiTTuj8)STf* (t  -  0,u)d0. 


(13) 


Which  one  of  the  two  forms  [(12)  or  (13)]  would  produce 
better  results,  depends  on  the  signal.  If  the  auto-terms  in  the 
STFT  are  well  concentrated  along  the  frequency  direction, 
then  (12)  would  be  a  better  choice,  and  vice  versa. 

We  have  already  found  that  for  a  given  signal  there  exists 
a  fractional  domain  where  the  STFT  can  be  performed  in 
an  optimal  way,  based  on  the  fractional  moments.  Finding 
the  domain  where  the  signal  is  best  concentrated  (minimal 
second  order  moment),  we  can  expect  that  the  application 
of  the  SWWD  in  the  domain  where  the  direction  of  best 
concentration  is  “the  frequency  axis”  (a  =  ae  —  ~tt)  will 
be  the  most  efficient  one.  There,  the  FT  of  the  signals’ 
fractional  FT  occupies  the  narrowest  range.  The  SWWD  in 
this  fractional  domain,  is  defined  as 

/OO 

ST?(u,v  +  9)z(9)STf(u,v-  9)d0,  (14) 

•OO 


where  STf(u,v )  and  u,v  are  defined  by  Eqs.  (4)  and  (5), 
respectively.  Using  the  rotational  properties  of  the  STFT 
[cf.  (5)],  we  could  rewrite  the  definition  (14)  as 

/OO 

ST°(t  +  9s'ma,Lu  +  9  cos  a)z(9) 

■OO 


x  exp(j'47Ru0  sin  a) ST °*  (t  —  9  sin  a,  u>  —  9  cos  a)d9,  (15) 

from  which  it  is  clear  that  the  SWWD  in  the  fractional  do¬ 
main  corresponds  to  the  SWWD  calculated  simultaneously 
in  the  time  and  the  frequency  direction.  The  two  cases  (12) 
and  (13)  follow  as  special  cases  from  (15)  with  a  =  0  and 
a  =  ^7r,  respectively. 

5.  DISCRETE  FORM  AND  EXAMPLE 


The  weighted  pseudo  WD  can  be  written  as  [6,  10] 


=?(*,«)  =  f 


STf(t,u>  +  9)z(9)STf*  (t,uj  -  9)d9,  (12) 


The  analog  form  (15)  suggests  that  the  discrete  form  of  the 
SWWD  in  an  arbitrary  domain  could  be  calculated  based 
on  the  signal’s  normal  STFT.  However,  the  values  of  the 
STFT  arguments  do  not  correspond  to  the  discretization 
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grid,  and  the  STFT  values  should  be  calculated  by  using 
some  interpolation  for  each  time-frequency  point  and  a  given 
a.  A  much  simpler  calculation  is  based  on  (4)  [using  (5)] 
and  (14).  After  the  angle  ac.  -  for  which  the  second-order 
fractional  moment  is  minimal,  see  Section  3.  Eq.  (10)  -  has 
been  determined,  the  discrete  fractional  FT  RJ  (n)  of  the 
signal  (or  of  the  window)  for  the  angle  a  =  ae  —  |7r,  that 
will  result  in  the  SWWD  calculation  along  the  “frequency 
axis"  of  a  domain,  is  calculated.  The  discrete  STFT  reads 

N/ 2-1 

STf(n,  k)  =  Rf(n  +  exp(  —j 2nm k/N). 

m  =  ~N/ 2 

The  discrete  form  of  (14)  reads 

Pf  in>  k)  =  \STf  (n,  k)  |2 

+  2Re{VAr'  STf  (n,  k  +  m)STf*  (n,k  —  m)}  (16) 

=  1 

where  a  rectangular  z(m)  of  the  width  2NZ  +  1  is  assumed. 
Therefore,  the  calculation  of  the  SWWD  can  be  understood 
as  calculation  of  the  fractional  spectrogram  in  the  domain 
defined  by  a,  and  its  improving  by  terms  2  Re{STf  («,  k  + 
m)STf*(n,  k—m)}  towards  the  rotated  WD  quality  of  auto¬ 
terms,  without  or  with  reduced  cross-terms.  Taking  just  a 
few  of  these  fractional  spectrogram  correcting  terms,  around 
the  considered  time-frequency  point,  we  start  immediately 
improving  auto-terms  concentration,  while  the  cross-terms 
will  start  appearing  only  when  we  take  the  values  from  an¬ 
other  auto-term.  If  we  would  take  that  the  width  of  the 
window  z(m)  were  Ns  =  |lVwe  would  get  the  rotated  WD. 

As  an  example,  consider  the  signal 

/(f)  =  exp[— (3f)8]{exp[j(1927rf2  -  8cos(47rf)/7r)] 

+  exp[j(64?rf2  +  8  cos(47rf )/ tt)]  } 

sampled  at  T  =  1/256.  A  Hanning  lag  window,  with  N,r  = 
128  samples,  is  used  for  the  STFT  calculation.  The  values 
of  the  normalized  central  moments  are  po  =  1,  p*/ 2  =  1-38, 
Ptt/4  =  0.07.  According  to  (10),  and  using  the  fact  that 
Po  <  Ptt/2)  we  get  ae  =  41°.  The  second-order  moment  in 
this  direction  is  smaller  than  in  any  other  direction:  pw  = 
0.057.  Now  the  fractional  FT  of  the  signal  for  the  angle 
a  =  ae  -  5 7r  =  —49°  (with  p_ 490  =  2.01)  can  be  calculated 
by  using  the  discrete  fractional  FT  algorithms,  or  just  by 
using  the  inversion  property  of  the  rotated  WD.  The  next 
step  is  to  calculate  the  STFT  of  the  fractional  FT  and  to 
use  it  in  (16). 

The  results  of  this  analysis  are  presented  in  Fig.  1.  The 
standard  WD  is  shown  in  Fig.  la.  The  SWWD  calculated 
by  the  standard  definition,  i.e.,  along  the  frequency  axis, 
with  Nz  =  10  correcting  terms,  is  presented  in  Fig.  lb.  We 
see  that  some  cross-terms  already  appear,  although  the  auto- 
terms  are  still  very  different  from  those  in  the  WD  in  Fig.  la. 
The  reason  lies  in  the  very  significant  spread  of  one  compo¬ 
nent  along  the  frequency  axis.  Fig.  lc  shows  the  WD  of  the 
fractional  FT  for  a  =  —49°,  obtained  as  the  optimal  an¬ 
gle  for  this  signal;  note  that  it  is  just  a  rotated  version  of 
the  original  WD.  The  SWWD  based  on  the  fractional  FT  is 
presented  in  Fig.  Id.  We  can  see  that,  as  a  consequence  of 
the  high  concentration  of  the  components  along  the  optimal 
fractional  angle,  we  almost  achieved  the  goal  of  getting  the 
auto-terms  of  the  WD  without  any  cross-terms. 


Note  that  if  the  signal  is  already  well  concentrated  in 
time  or  in  frequency,  then  the  proposed  procedure  will  also 
produce  the  standard  calculation  directions,  as  special  cases. 

6.  CONCLUSION 

A  generalized  form  of  the  weighted  WD  (or  SWWD)  is  pre¬ 
sented.  The  realization  is  done  in  the  fractional  FT  domain 
with  minimal  signal  width.  This  domain  is  optimal  with 
respect  to  auto-terms  convergence  and  cross-terms  suppres¬ 
sion  in  the  SWWD.  Further  research  could  be  directed  to¬ 
ward  the  application  of  local  optimization  of  the  angle  a  as 
a  time  dependent  one. 
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ABSTRACT 

The  connection  between  the  instantaneous  frequency  and 
the  angle  derivative  of  the  fractional  power  spectra  is  es¬ 
tablished.  It  permits  to  solve  the  signal  retrieval  problem  if 
only  two  close  fractional  power  spectra  are  known.  This  fact 
is  used  in  the  reconstruction  of  the  Wigner  distribution  or 
the  pseudo  Wigner  distribution  from  two  close  projections. 


function.  In  Section  4  we  discuss  the  discrete  version  of  the 
proposed  phase  retrieval  method  and  demonstrate  its  effi¬ 
ciency  on  examples.  The  importance  of  the  new  algorithm 
and  its  possible  applications  are  discussed  in  the  Conclu¬ 
sions. 

2.  FRACTIONAL  POWER  SPECTRA  AND 
AMBIGUITY  FUNCTION 


1.  INTRODUCTION 

The  reconstruction  of  a  signal  -  and  in  particular  its  phase  - 
from  the  distributions  associated  with  the  instantaneous 
power  of  the  signal,  its  power  spectrum  or,  more  general,  its 
fractional  power  spectra,  is  an  important  problem  in  signal 
processing,  radio  location,  optics,  quantum  mechanics,  etc. 
In  spite  of  several  successful  iterative  algorithms  for  phase 
reconstruction  from  the  squared  modulus  of  the  signal  and 
its  power  spectrum,  or  its  Fresnel  spectrum,  which  were 
proposed  recently  [l]-[3],  the  development  of  noniterative 
procedures  remains  an  attractive  research  topic. 

The  fractional  power  spectra,  which  are  the  squared 
moduli  of  the  fractional  Fourier  transform  (FT)  [4] ,  are  now 
a  popular  tool  in  optics  and  signal  processing  [4]-[ll] .  As  it 
is  known,  they  are  equal  to  the  projections  of  the  Wigner 
distribution  (WD)  of  the  signal  under  consideration  [11, 12]. 
Thus,  by  using  the  tomographic  approach  and  the  inverse 
Radon  transform,  the  WD  -  and  therefore  the  signal  itself, 
up  to  a  constant  phase  factor  -  can  be  reconstructed  by 
knowing  all  its  projections  [5,  8].  The  method  is  based  on 
the  rotation  in  the  time-frequency  plane  of  the  WD  under 
the  fractional  FT.  It  demands  the  measurements  of  the  frac¬ 
tional  FT  spectra  in  the  wide  angular  region  [0,7r),  which 
sometimes  is  impossible  or  very  cost  consuming  [5]. 

In  this  paper  we  propose  a  new  approach  for  the  WD  re¬ 
construction  from  only  two  fractional  FT  spectra,  i.e.,  only 
two  WD  projections.  This  approach  significantly  reduces 
the  need  for  projections  measurements  and  calculations.  Its 
is  also  direct  and  does  not  use  iterative  procedures. 

The  paper  is  organized  as  follows.  In  Section  2  we 
present  a  review  of  the  definition  of  the  fractional  FT,  and 
the  relationship  between  the  fractional  FT  power  spectra 
and  the  ambiguity  function  of  a  signal.  In  Section  3  we  es¬ 
tablish  the  connection  between  the  instantaneous  frequency 
in  a  fractional  domain  and  the  angular  derivative  of  the  frac¬ 
tional  FT  power  spectra.  We  show  that  the  instantaneous 
frequency  is  determined  by  the  convolution  of  the  angular 
derivative  of  the  fractional  power  spectra  and  the  signum 


The  fractional  FT  of  function  x(t),  can  be  written  in  the 
form  [4] 


•Ra[z(t)](u)  = 


K(a,t,u)x(t)dt, 


(1) 


where  the  kernel  K(a,t,u)  is  given  by 


K(a,t,u)  = 


exp  {jr2a) 
y/j  sin  a 


exp(j7r 


(f2  +  u2)  cos  a  —  2 ut 


(2) 


Note  that.,  in  particular,  Xo(u)  =  x(u),  X^(u)  =  x(—u), 
and  that  Xn/2(u)  corresponds  to  a  normal  FT.  This  trans¬ 
form  is  additive  on  the  parameter  a  which  corresponds  to 
the  rotation  angle  of  the  coordinate  system. 

It  is  known  that  the  fractional  power  spectra  \Xa(u)\2, 
i.e.,  the  squared  moduli  of  the  fractional  FT,  are  equal  to 
the  projections  of  the  WD  Wx(t,  f)  of  the  signal  x(t), 


Wx(t,  f)S(t  cos  a  +  /  sin  a  —  u)dfdt 


Wx{u  cos  a  —  /  sin  a,  u  sin  a  +  /  cos  a)df. 


(3) 


The  set  of  fractional  power  spectra  in  the  angular  region 
[0, 7r)  is  also  called  the  Radon-Wigner  transform. 

Since  the  ambiguity  function  Ax  (t,  v)  is  the  two-dimen¬ 
sional  FT  of  the  WD  Wx(t,  /),  the  values  of  the  ambiguity 
function  along  the  line  defined  by  a  are  -  according  to  the 
Radon  transform  properties  -  equal  to  the  FT  of  the  WD 
projection  for  the  same  a  [6,  8], 


Ax  (R  sin  a,  —R  cos  a) 


i: 


|X«(«)I 


2  ^j27rRu 


du. 


(4) 


We  can  also  say  that  the  fractional  power  spectrum  |Xa(u)|2 
is  the  FT  of  the  ambiguity  function.  Note  that  this  relation¬ 
ship  is  very  important  for  the  experimental  determination  of 
the  ambiguity  function  in  optics,  where  the  fractional  power 
spectra  related  to  intensity  distributions  can  be  measured 
by  a  simple  optical  setup  [6] . 
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3.  WIGNER  DISTRIBUTION  PROJECTIONS 
AND  INSTANTANEOUS  FREQUENCIES 


In  this  section  we  will  derive  that  the  well-known  expression 
for  the  instantaneous  frequency  fo{t)  at  the  position  t  [13] 


Mi)  = 


j__l_  r 

2?r j  \x(t)\2  J_ 


dAT(  7 


dr 


J2*"' 


du. 


(5) 


i.e..  two  WD  projections.  Of  course,  instead  of  reconstruct¬ 
ing  the  WD,  we  will  actually  reconstruct  the  pseudo  WD, 
but  for  an  appropriate  window  function  and  a  small  angle  a, 
this  will  not  have  any  noticeable  effect  on  the  final  results. 
After  two  WD  projections  have  been  obtained  or  in 
practice  measured  as  fractional  power  spectra  by  using  an 
appropriate  optical  setup,  the  instantaneous  frequency  is 
calculated  as  the  output  of  the  linear  system,  cf.  Eq.  (7), 


can  be  written  in  terms  of  the  local  moments  of  the  frac¬ 
tional  power  spectra.  Indeed,  using  the  relationship  [14] 


/o(«T)  =  - 


1  |A'a(nT)|2  —  |-X-q(nT)|2 
2a  2  \X0(nT)\2 


*„ 


sgn(nT) 


(9) 


dAx(r,  v) 


dr 
we  get 

foil)  = 


d\Xa{u)\7 


2  |  A', 


da 


d\Xa(u)\2 

da 


z-32™"du.  (6) 


sgn(f  —  v)dv,  (7) 


where  sgn(t)  is  the  signum  function: 


where  T  is  the  discretization  step,  the  angle  a  is  small, 
and  *„  denotes  the  discrete-time  convolution;  moreover, 
in  order  to  avoid  a  separate  estimation  of  |A'o(nT)|2,  for 
small  a  the  denominator  2|A'0(nT)|2  can  be  approximated 
by  |A'Q(nT)|2  +  |A'_a(nT)|2.  Note  that  instead  of  this  sym¬ 
metrical  version  of  the  system,  we  might  as  well  have  chosen 
an  asymmetrical  one  with  —  a  replaced  by  0  and  2o  by  a. 

The  signal,  up  to  the  constant  phase  factor,  is  recon¬ 
structed  as 


sgn(f)  =  —  f  -e*™'dv  = 
*3  J-  oc  v 


{ 


1  for  1  >  0, 
—  1  for  t  <  0. 


(8) 


We  thus  get.  for  the  signal  x(t)  —  |A'o(t)|  exp[jv?(/,)],  that  its 
phase  derivative  tp'(t)  =  dip(t)/dl  =  2nfu(t)  is  determined 
by  its  intensity  |Xo(t)|2  and  the  convolution  of  the  signum 
function  with  the  angular  derivative  of  the  fractional  power 
spectrum  d\Xa(u)\ 2 /da  at  the  angle  a  =  0.  Note  that  this 
relationship  can  easily  be  generalized  for  an  arbitrary  angle 
a  ^  0,  using  |Aa(t)|2  and  fa(t)  [14]  . 

In  general,  the  complex-valued  fractional  FT  Xa(t),  and 
in  particular  the  signal  x(t.)  =  Xo{t),  can  be  completely 
reconstructed  (except  for  a  constant  phase  shift)  from  its 
intensity  distribution  |Aa(f)|2  and  its  instantaneous  fre¬ 
quency  Since  the  instantaneous  frequency  is  deter¬ 

mined  by  the  derivative  of  the  fractional  power  spectra,  see 
Eq.  (7),  this  implies  that  only  two  fractional  power  spectra 
for  close  angles  suffice  to  solve  the  signal  retrieval  problem, 
up  to  a  constant  phase  factor.  By  reconstructing  the  signal, 
up  to  this  constant  phase  factor,  from  two  fractional  power 
spectra  (i.e.,  two  WD  projections),  we  can  reconstruct,  the 
whole  WD.  Because  x(t)  is  related  to  Xa(t)  through  the  in¬ 
verse  fractional  FT,  we  can  conclude  that  the  signal  phase 
can  be  reconstructed  up  to  a  constant  term  by  a  noniter¬ 
ative  way  from  any  two  fractional  power  spectra  taken  for 
close  angles. 

Note  that  this  result  resembles  the  so-called  transport 
of  intensity  equation,  which  deals  with  the  Fresnel  trans¬ 
formation  [15]-[17] .  This  is  not  surprising  since  both  the 
fractional  FT  and  the  Fresnel  transform  belong  to  the  class 
of  canonical  integral  transforms  and  the  properties  of  any 
member  of  this  class  are  related,  too. 


4.  DISCRETIZATION  AND  EXAMPLES 
4.1.  Discretization 

Here  we  will  illustrate  on  some  numerical  examples  how  the 
signal,  up  to  a  constant  phase  factor,  and  its  WD  can  be 
reconstructed  from  only  two  close  fractional  power  spectra. 


x(riT)  =  |X0(r)T)|  exp(j  V'  <p'(rnT)T]  (10) 

z — Jm  —  —  N 

and  the  (pseudo)  WD  is  calculated  according  to  its  defini¬ 
tion 

Wi(n.k)  =  2  T  Y  x[(n  +  m)T]x*[(n  —  m)T) 

x  w(mT)e-j2lrmk/N ,  (11) 

where  w(vT)  is  an  appropriately  chosen  window  function 
and  where  N  is  chosen  such  that  x(nT)  «  0  for  |n|  >  N. 

The  fractional  spectra  ]AQ(r7T)|2  and  |A_a(nT)|2  can 
be  obtained  in  different  ways:  (i)  measured  in  experiments 
(a  simple  optical  set  up  for  the  measurements  of  the  frac¬ 
tional  power  spectra  was  described  in  [18]);  (ii)  calculated 
as  squared  moduli  of  the  corresponding  fractional  FT  of 
x(t);  (iii)  calculated  as  the  Radon  transform  of  the  WD  of 
x(t)  for  two  angles  ±a. 

4.2.  Examples 

Example  1:  We  start  with  the  reconstruction  of  a  mono¬ 
component  signal,  whose  instantaneous  frequency  is  a 
monotonic  function.  Thus  a  signal  of  the  form 

x(t)  =  exp[— (2.25f)8] 

x  exp  {j  f  [407rsinh_1(100/)  +  2567rf]d/}  (12) 

J  —  O© 

is  considered,  with  T  =  1/1024.  Its  (pseudo)  WD  is  calcu¬ 
lated,  by  using  a  Hanning  window  w(t)  with  width  T,„  = 
1/8.  After  the  WD  has  been  obtained,  we  assume  that 
only  two  of  its  projections  are  known,  for  angles  a  =  — 1° 
and  a  =  1°.  The  projections  are  calculated  by  using  the 
MATLAB  radon  function,  taking  the  WD  matrix  as  the  ar¬ 
gument.  This  corresponds  to  the  case  where  two  fractional 
power  spectra  |Xa(nT)|2  and  |X'_c,(t?T)|2  are  obtained  by 
measurements  in  an  optical  system.  The  procedure  de¬ 
scribed  before  [cf.  Eq.  (9]  is  then  used  for  the  reconstruction 
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Figure  1:  Monocomponent  signal  and  its  WD  reconstruc¬ 
tion  from  two  close  fractional  power  spectra  (two  WD 
projections):  a)  Original  WD,  b)  Projections  of  the  WD 
(Radon-Wigner  transform),  c)  Derivative  approximation: 
difference  of  two  close  projections  calculated  at  1°  and  — 1°, 
and  divided  by  the  angle  step,  d)  Reconstructed  (dash-dot) 
and  original  (solid  line)  instantaneous  frequency  of  the  sig¬ 
nal,  e)  Reconstructed  (dash-dot)  and  original  (solid  line) 
phase  of  the  signal,  f)  Reconstructed  WD. 


Figure  2:  Multicomponent  signal  and  its  WD  reconstruc¬ 
tion  from  two  close  fractional  power  spectra  (two  WD 
projections):  a)  Original  WD,  b)  Projections  of  the  WD 
(Radon-Wigner  transform),  c)  Derivative  approximation: 
difference  of  two  close  projections  calculated  at  1°  and  —  1°, 
and  divided  by  the  angle  step,  d)  Reconstructed  instanta¬ 
neous  frequency  of  the  signal,  e)  Reconstructed  phase  of  the 
signal,  f)  Reconstructed  WD. 


of  the  signal’s  instantaneous  frequency,  its  phase,  and  the 
signal  itself  [Eq.  (10)],  from  these  two  projections  only. 

The  original  WD  is  given  in  Fig.  la.  Its  Radon-Wigner 
transform  \Xa(nT)\2  [cf.  Eq.  (3)]  is  presented  in  Fig.  lb,  for 
angles  a  €  [0°,180°).  Only  two  projections,  for  a  =  ±1°, 
are  used  for  further  calculations.  The  difference  of  these 
projections,  (|XQ(nT)|2  -  \X-a(nT)\2)/2a  for  a  =  1°,  is 
shown  in  Fig.  lc.  The  reconstructed  instantaneous  fre¬ 
quency  and  the  reconstructed  phase  are  given  in  Fig.  Id 
and  Fig.  le,  respectively,  by  a  dash-dot  line,  while  the  orig¬ 
inal,  exact  values  are  represented  by  solid  lines.  We  can 
see  that  the  agreement  between  the  reconstructed  and  the 
original  instantaneous  frequency  is  very  high.  The  phase 
has  a  constant  shift,  as  can  be  expected.  The  reconstructed 
WD  according  to  (11)  is  given  in  Fig.  If. 

Example  2:  The  reconstruction  of  a  multicomponent 
signal,  having  the  same  amplitude  variation  as  the  signal 
in  Example  1,  but  with  a  different  phase  variation, 

x(t)  =  exp[—  (2.25t)8] 

x{exp[j  f  u)i(t)dt]  +  0.5exp[j  f  (13) 


u>i (t)  =  3847T  |t|  +  2567T,  iU2 (t)  =  10247rt2  +  647r, 

is  considered  in  this  example.  Note  that  the  instantaneous 
frequency  of  this  signal  is  not  a  continuous  function.  Nev¬ 
ertheless,  we  still  obtain  a  satisfactory  reconstruction  of  the 
phase  and  the  WD,  using  only  two  fractional  power  spectra 
(see  Fig.  2). 

Example  3:  The  reconstruction  algorithm  is  tested  for 
noisy  signals,  as  well.  The  signal  from  Example  1,  conta¬ 
minated  by  Gaussian,  complex-valued,  white  noise  v(t) 

x(t)  =  exp[— (2.25t)8] 

x{Aexp[j  f  (40-7rsinh_1(100t)  +  2567rf)dt]  +  n(t)}  (14) 

J  —  OO 

is  considered.  Various  values  of  the  local  signal-to-noise 
ratio  SNR  =  201og(A/<7„)  have  been  used  in  simulations. 
Figure  3  presents  the  reconstruction  result  for  a  SNR  of  9 
dB.  Small  deviations  of  the  reconstructed  distribution  can 
be  seen  in  this  case.  From  numerous  calculations,  we  have 
concluded  that  the  reconstruction  threshold  is  at  about 
SNR  =  3  dB.  Below  this  value,  the  degradation  of  the 
reconstructed  distribution  is  significant.  Nevertheless,  it 
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Figure  3:  Noisy  signal  and  its  WD  reconstruction  from 
two  close  fractional  power  spectra  (two  WD  projections): 
a)  Original  WD,  b)  Reconstructed  WD.  Signal  to  noise  ra¬ 
tio  is  9  dB. 


seems  that  for  signal  reconstruction  in  very  high  noise,  the 
knowledge  of  several  pairs  of  close  projections  would  im¬ 
prove  the  results.  In  that  case  we  can  calculate  the  dif¬ 
ferences  of  the  fractional  power  spectra  for  different  small 
angles  and  average  them.  Furthermore,  using  other  dis¬ 
crete  differentiators,  different  from  the  simple  one  given  by 
a  mere  difference,  would  also  improve  noisy  case  results. 
However,  since  the  original  algorithm  produces  satisfactory 
reconstruction  even  for  as  low  a  SNR  as  a  few  dB,  we  have 
not  implemented  this  variation  of  the  algorithm,  for  now. 

5.  CONCLUSIONS 

In  this  paper  we  have  established  the  connection  between 
the  angular  derivative  of  the  fractional  power  spectra  and 
the  instantaneous  frequency,  and  we  have  proposed  a  me¬ 
thod  of  phase  reconstruction  from  only  two  close  fractional 
projections  of  the  WD.  The  numerical  simulations  show 
that  the  discussed  phase  retrieval  algorithm  produces  good 
results  for  different  types  of  signals.  The  reconstruction 
technique  works  well  for  a  signal-to-noise  ratio  as  low  as 
about  3  dB.  The  main  advantages  of  the  proposed  method 
are  that  it  is  noniterative  and  that  it  demands  a  minimum 
number  of  initial  data  -  only  two  fractional  FT  power  spec¬ 
tra  -  which  are  related  to  easily  measurable  intensity  dis¬ 
tributions.  Thus  in  optics  and  quantum  mechanics,  the 
fractional  FT  spectrum  corresponds  to  the  intensity  distri¬ 
bution  and  probability  distribution,  respectively. 

We  have  briefly  discussed  the  possible  applications  of 
the  angular  derivatives  of  the  fractional  FT  power  spectra 
for  signal  processing,  which  becomes  especially  attractive  if 
only  the  fractional  projections  of  a  signal  are  known. 
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ABSTRACT 

We  present  a  method  for  predicting  non-stationary  signals 
generated  by  a  time  varying  composite  source.  The  method  is 
based  on  the  concept  of  temporal  fuzzy  clustering.  A  fuzzy 
clustering  algorithm  is  applied  to  the  given  part  (past+present)  of 
the  time  series  and  the  calculated  clusters  and  membership 
matrix  are  then  used  to  estimate  a  mixture  probability 
distribution  function  (PDF)  underlying  the  series.  In  this  way  a 
continuous  drift  in  the  series  distribution  expressed  as  a  drift  in 
the  clusters’  appearance  rate  can  be  estimated.  A  future  PDF  can 
then  be  predicted  by  fitting  a  specific  model  to  the  estimated 
past  and  future  PDF  values.  This  also  enables  the  generation  of  a 
minimal-mean-squared-error  prediction  for  a  future  time  series 
element  using  the  estimated  mean  value  of  the  predicted  PDF. 

1.  INTRODUCTION 

Many  physical,  biological  and  economical  systems  produce 
measurable  time  series.  Analysis  of  the  measured  series  may 
enable  better  understanding  of  the  system  and  the  processes 
underlying  it,  prediction  of  future  behavior  and  detection  of 
meaningful  temporal  patterns.  Tools  for  time  series  analysis  and 
prediction  were  developed  for  a  wide  range  of  applications  but 
most  of  them  share  the  assumption  of  stationarity  [1],  Methods 
assuming  semi-stationarity  such  as  hidden  markov  models  - 
HMM  may  be  used  in  cases  where  the  series  can  be  segmented 
into  stationary  periods  [2][3].  In  many  cases,  however,  there  is  a 
continuous  change  in  the  probability  distribution  function  (PDF) 
of  the  series  in  which  case  the  semi-stationarity  assumption  may 
impose  a  large  error  on  the  prediction  result.  Also,  the  series  can 
be  composed  of  stationary  segments  with  long  drift  periods 
between  them.  Detection  and  prediction  in  these  drift  periods  can 
be  very  important  (for  example  in  medical  application  where 
early  alarm  signs  or  short  abnormal  periods  can  have  vital 
clinical  importance).  Attempts  were  made  to  merge  two 
consecutive  stationary  states  to  achieve  the  ability  to  model  such 
a  drift  and  to  use  methods  based  on  artificial  neural  networks  for 
PDF  estimaion  [4]. 

We  suggest  modeling  the  generator  of  non-stationary  time  series 
by  a  time  varying  mixture  of  stationary  or  semi-stationary 
sources.  We  then  combine  fuzzy  clustering  in  the  observation 
space  and  an  analysis  of  the  membership  matrix  on  the  time  scale 
in  order  to  estimate  the  model  parameters.  The  given  part 
(past+present)  series  is  clustered  as  an  unindexed  set  of 
observations  and  then  we  project  the  resulting  membership 
matrix  back  to  the  time  scale.  The  membership  matrix  is  given 
the  interpretation  of  a  continuous  temporal  change  in  the  weights 
of  a  mixture  probability  distribution  function.  By  estimating  the 
current  values  of  the  weights  from  the  membership  matrix  we  can 


derive  an  optimal  next-step  prediction  (in  the  minimal  squared 
error  sense)  of  the  series. 

In  real  time  applications  we  also  apply  the  clustering  algorithm  to 
a  set  of  given  observations  to  receive  an  initial  condition.  In  this 
stage  we  may  also  estimate  the  number  of  sources  by  using  an 
unsupervised  clustering  algorithm  and  not  only  their  parameters. 
After  receiving  this  initial  state  we  use  a  fixed-length  moving 
window  of  observations  for  continuous  update  of  the 
membership  matrix  and  also  to  recalculate  the  cluster  parameters 
if  they  are  assumed  time  varying  as  well. 

2.  METHODS 

2.1  Model  Definition 

The  output  of  a  composite  source  [6]  [7]  is  a  discrete  time  series 
generated  from  N,  D  dimensional,  continuous  sub-sources  which 
are  sampled  by  a  random  switching  function  {F(0(t)  |  0(t)e 
[0,1]N).  The  input  sources  are  represented  by  the  temporal  matrix 
X(t)e9INxD  where  each  one  of  its  N  columns  containing  a 
random  vector  process  with  dimension  D  originated  in  a  different 
sub-source  X;(t).  The  dimension  D  can  represent  a  multi-channel 
feature  vector,  a  temporal  sliding  window  or  a  combination  of 
both.  0(t)  is  a  vector  of  probabilities  for  selecting  a  single  sub¬ 
source  in  each  time  sample  kt,  to  be  transmitted  to  the  output: 

y(kt)  =  x;(kt)  with  (1) 

and 

2j6j(t)  =  1  for  all  °o— <t<°° 

i=l 

The  random  switch  takes  a  new  position  each  time  according  to 
the  probability  vector  0(t)  and  outputs  the  vector  y(t)e  91°  which 
equals  one  of  the  columns  of  X(t). 

A  variety  of  models  for  the  temporal  behavior  of  0(t)  can  be 
assumed  depending  on  the  specific  application  or  physical 
phenomenon  and  thus  enabling  the  description  of  a  wide  range  of 
time  series. 

2.2  Prediction  Method 

In  [8]  we  presented  an  algorithm  for  the  estimation  of  0(t)  using 
temporal  clustering  which  will  be  described  briefly. 

The  clustering  space  is  composed  of  L  sampled  points  from  the 
time  series  {y(n)e  91°  |  1<  n  <L}.  Fuzzy  partition  is  defined  by  a 
set  of  N  cluster  means  (prototypical  elements)  •|ii  e  9tD  1 1  <  i  <  n} 
and  the  membership  matrix  {Ue  [0,1]L'N}  of  each  element  yn  in 
each  cluster  Cj  with  prototype  |X[.  The  clustering  procedure  that  we 
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(7) 


are  currently  using  is  the  hierarchical  unsupervised  fuzzy 
clustering  (HUFC)  algorithm  recently  presented  in  [5],  which 
seems  to  better  fit  the  lion-stationary  and  transient  nature  of  the 
given  time  series.  After  clustering  we  return  to  the  time  axis  and 
divide  the  series  into  a  set  of  L/K  segments,  each  including  K 
samples.  A  moving  average  of  the  membership  matrices  is  then 
used  to  estimate  a  sampled  version  of  0(t)  for  each  partition: 


LXl.n+An  v?XN,n+An  J 

=  {E{x,.n+A„  I  y” } v,E{xNn+An  I  y;}}= 


e.N(k)=4  Ki  u» 

K  j=K(k-l)+l 

for  1  <  k  <  -!f;l  <  i  <  N;Nmm  <  N  <  N 
K 


(3) 


where  N  is  the  number  of  sub-sources  in  the  composite  source. 

By  estimating  the  future  a  priori  probabilities  vector  (APV)  for 
each  sub-source  0  n+An  we  can  estimate  the  future  value  of  the 

time  series  yn+An  . 


Where  uNLj  e  UN  is  the  estimated  membership  of  the  j-th  sample 
in  cluster  i  of  the  N-tli  partition.  The  result  for  each  partition  is  a 
sampled  version  of  the  estimated  0(k)  in  a  sampling  rate  of  fs/K 
where  fs  is  the  sampling  rate  of  the  given  time  series.  We 
consider  two  tasks  regarding  the  prediction  of  non-stationary 
time  series  generated  by  a  time  varying  composite  source.  The 
first  task,  predicting  future  values  of  the  series  PDF  can  be 
performed  by  using  a  model  for  the  behavior  of  0(t).  The  type  of 
the  model  can  be  selected  using  additional  information  related  to 
the  application  at  hand  or  by  analyzing  a  long  baseline  period. 
The  model  may  describe  a  deterministic  process  (for  example  a 
linear  trend  or  a  periodic  one  that  generates  a  cyclo-stationary 
series)  or  a  random  one  (for  example  a  markov  chain  that  will 
generate  an  HMM-like  series).  This  issue  is  highly  dependent  on 
the  specific  time  series  or  application  at  hand  and  will  not  be 
addressed  here  in  detail.  The  second  task  on  which  we  will  focus 
is  predicting  the  next  future  element  of  the  series  by  using  the 
results  of  temporal  clustering  i.e.  cluster  prototypes  and  temporal 
regime  0(t). 

Given  the  time  series: 

{y„e9?D|L>n>l}  (4) 

0„  can  be  viewed  as  a  sampled  version  of  the  continuous  time 
varying  signal  0(t) 


We  conclude  that  the  MMSE  predictor  for  our  model  is  given  by: 

v  =e{v  lvni=  ^ 

J  n+An  U  n+An  I  «/  1  J 

=  Ip(y„+A„  =X,.n+An  ly")A, 

1=1 

We  recall  that 

e,„=p(y„=*„)  (9) 

0  =  {0,  n  1 1  <  i  <  N,1  <  n  <  L} 

Where  N  is  the  number  of  sub-sources  and  L  is  the  number  of 
samples  in  the  series. 

The  estimation  of  the  APV  is  given  by: 

6i.n„sp(y..=xuly")  (10) 

For  simplicity  of  presentation  we  shall  use  the  notation: 

P(Xi.n)=  P(yn=Xi.n) 

The  sub-sources  expectations  estimation: 

A,=E{xi|yr}=ie,jyj  ^  (11) 

j=l 


If  we  can  forecast  a  future  value  for  the  vector  0n+An  then, 
given  all  prototypical  elements  for  each  of  the  sub-sources,  a 
prediction  for  the  element  yn+An  can  be  formulated.  The 
optimal  predictor  in  the  Minimal  Mean  Square  Error  (MMSE) 
sense  is  given  by: 


y 


n+An 


E{y„+An  ly“}- 


(5) 


where 

y"  ={yiv,yn} 


(6) 


Given  that  all  sub-sources  are  i.i.d  we  can  calculate  all  optimal 
predictors  from  the  estimated  means. 


Q  =  {|i,  1 1  <  i  <  N} 

We  get: 

J  *  «  1  n  *  .  (12) 

y„+A„  =  E\yn+An  I  en,ftj=  Ie,n+An|nMi 

Temporal  clustering  offers  the  ability  to  calculate  both  A  and 
0i  simultaneously  and  allows  for  a  flexible  trade-off  between 
time  and  frequency  resolution  in  estimating  the  temporal  change 
of  tlie  APV  0n 


3.  RESULTS 

We  will  now  present  two  examples  of  time  series  prediction  tasks 
performed  by  temporal  clustering.  First,  a  non-stationary  time 
series  with  5000  elements  was  generated  by  a  random  source  that 
was  composed  of  a  continuous  time  varying  mixture  of  3 
Gaussian  stationary  sub-sources.  All  sub-sources  variances  were 
10  and  the  means  were  10.  50  and  90.  The  time  varying  mixture 
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that  was  used  for  the  source  was  a  combination  of  exponentially 
and  linearly  changing  expressions  with  step  functions.  The  3 
mixture  weights  are  presented  in  the  upper  3  traces  of  figure  1 . 
The  resulting  time  series  is  presented  in  the  bottom  trace.  The 
segments  used  for  train  and  test  are  marked  on  the  figure.  The 
train  period  was  used  to  extract  the  sources  parameters  namely 
the  source  means  by  unsupervised  fuzzy  clustering  (results  were 
88.8,  49.5  and  10.0).  The  estimated  means  were  used  to  classify 
all  the  time  series  elements  during  the  test  period  and  produce  a 
prediction  for  each  element  from  its  past  window  of  30  elements 
(i.e.  for  predicting  element  i  we  used  elements:  i-31,  i-30, ..,  i-1). 
The  resulting  time  series  prediction  is  drawn  in  the  middle  trace 
of  figure  2a  against  the  original  test  period  presneted  in  the  upper 
trace.  The  prediction  error  is  presneted  in  the  lower  trace.  Figure 
2b  presents  a  small  portion  of  the  time  series  (connected  circles) 
with  the  corresponding  predicted  values  (connected  x). 

In  figure  3  the  predictions  for  theta  are  drawn  against  the  original 
mixture  temporal  probabilities  (3  upper  traces)  together  with  the 
overall  classification  mean  squared  error  (bottom  trace)  defined 
as  the  weighted  average  of  error  in  estimating  0t . 

We  can  see  that  when  there  is  a  low  error  in  the  estimation  of  0, 
then  the  prediction  error  is  mainly  related  to  the  error  in 
estimating  the  sub-source  parameters.  See  for  example  the  points 
before  n=1000  in  figure  2a  that  have  very  low  classification 
error.  Around  n=500  (in  figure  2a)  when  the  distance  between 
the  confused  cluster  means  is  relatively  small  there  is  a  low 
prediction  error  even  though  there  is  a  substantial  classification. 
When  there  is  a  large  error  in  estimation  of  0,  then  the  prediction 
error  term  depends  on  the  distribution  of  the  sub-sources  means. 
Around  n=750  there  is  a  peak  in  classification  error  also  resulting 
in  a  large  variance  in  the  prediction  error.  In  figure  3  we  can  also 
see  the  tracking  capability  of  the  algorithm  when  there  are 
changes  in  the  time  series  distribution.  The  tracking  speed 
depends  on  the  length  of  the  segment  used  to  avergae  the 
membership  and  the  sensitivity  of  the  membership  to  the  distance 
from  the  cluster  mean.  In  figure  2b  we  can  see  the  change  in  the 
prediction  value  due  to  a  change  in  the  estimated  mixture  in  a 
small  window  of  observations. 


0  500  1000  1500  2000  2500  3000  3500  4000  4500  5000 


Sample  Number 

Figure  1  -  A  simulated  example  for  a  time  varying  composite 
source  produced  by  3  gaussian  sources  with  means:  10,  50,  90 
and  all  with  variance  10.  The  three  upper  figures  show 
temporal  mixture  a-priori  probability.  The  bottom  figure 
shows  the  resulting  time  series.  Vertical  line  separates 
training  and  test  periods  as  used  for  the  simulation. 
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Figure  2a  -  Original  time  series  (upper  figure),  predicted  time 
series  (middle  figure)  and  prediction  error  (bottom  figure) 
during  the  test  period. 


Figure  2b  -  Original  time  series  (circle  trace)  and  predicted 
time  series  (X  trace)  of  a  portion  of  the  simulated  time  series 
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Figure  3  -  Original  (thin  line)  and  predicted  (thick  line) 
temporal  mixture  probabilities  in  the  3  upper  figures 
(corresponding  to  sub-sources  with  means  10,  50  and  90). 
Mean  squared  classification  error  (bottom  figure). 


Next,  we  applied  the  prediction  algorithm  to  a  time  series 
representing  the  RR  intervals  (Time  intervals  between 
consecutive  heart  beats)  of  rats  with  hyperbaric-oxygen-induced 
generalized  seizures.  The  rats  were  implanted  with  chronic 
surface  cortical  electrodes  and  sub-coetaneous  ECG  electrodes 
and  exposed  to  pure  oxygen  in  a  pressure  chamber.  Selected 
sections  of  the  ECG  were  digitized  at  a  sampling  rate  of  1000  Hz. 
All  digitized  sections  were  then  analyzed  by  software  that  was 
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developed  by  the  Israeli  Naval  Medical  Institute.  All  irregular 
and  pathological  beats  were  left  to  be  included  in  the  analyzed 
series.  The  final  output  of  the  software  was  an  indexed  list  of 
consecutive  RR-intervals  (RRi).  The  list  is  converted  into  a 
point  array  in  an  TV-dimensional  space,  the  axes  being  the 
durations:  Rri(n),  RRi(nll) . RRi(n+N.,)  (lag  plots) 

Hie  RR-interval  time  series  extracted  from  minutes  18-25  at 
pressure,  in  a  rat  that  seized  after  25  min.  is  shown  in  the  upper 
part  of  figure  5.  The  middle  trace  of  figure  5  presents  the 
prediction  results  obtained  using  the  algorithm  and  the  lower 
trace  presents  the  prediction  error.  The  input  for  the  algorithm 
was  a  3  point  window  of  previous  elements  (i.e.  the  prediction 
for  x(i)  was  obtaind  using  x(i-3).x(i-2),  x(i-l)  ).  The  series  was 
fuzzy  partitioned  to  5  clusters  and  the  resulting  membership 
matrix  was  averaged  for  each  consecutive  4  points. 

The  first  18  minuets  of  the  RR  series  were  defined  as  a  train 
phase  and  were  used  to  select  the  best  partition  of  the  data  to 
clusters.  In  the  prediction  stage  we  used  the  number  of  clusters 
and  cluster  prototypes  obtained  from  the  train  stage  and  produced 
the  prediction  presented  in  figure  4. 


Figure  4  -  Measured  (upper  trace)  and  predicted  (middle 
trace)  RR  series  of  a  rat  exposed  to  high  oxygen  presure. 
prediction  error  is  drawn  in  the  bottom  trace. 


The  prediction  of  elements  farther  than  the  adjacent  future 
demands  more  assumptions  regarding  the  process  underlying  0n 
and  should  be  treated  with  a  specific  application  in  mind. 

The  algorithm  can  be  enhanced  by  performing  re-estimation  of 
cluster  prototypes  in  a  moving  window  (longer  than  the  one  used 

to  calculate  9n  ).  This  may  be  unavoidable  for  a  long  series  and 
can  also  be  used  to  compensate  for  a  drift  in  the  sub-sources 
means. 

We  conclude  by  stating  that  our  future  goal  is  to  investigate 
temporal  clustering  as  a  tool  for  other  signal  processing  tasks 
with  comparison  to  common  signal  processing  methods.. 
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4.  SUMMARY 

The  presented  method  for  prediction  of  non-stationary  time  series 
can  be  used  for  time  series  with  continuous  change  in  regime  that 
can  be  described  by  a  time  varying  mixture  PDF.  When  the  non- 
stationarity  of  tire  series  is  mainly  caused  by  changes  in  the  PDF 
of  the  sub-sources  and  not  in  the  mixture  (for  example  a  single 
gaussian  noise  source  with  a  drift  in  its  mean  value)  then  the 
method  is  expected  to  be  less  adequate  for  prediction. 

The  suggested  prediction  procedure  can  also  be  used  as  a  non¬ 
linear  filtering  scheme  that  may  be  especially  useful  in 
applications  where  clusters  of  outliers  are  expected  to  appear 
with  some  probability  and  add  a  non-stationary  and  non-linear 
aspect  to  an  otherwise  stationary  linear  baseline  as  in  the  RR 
example. 
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ABSTRACT 

This  contribution  introduces  a  new  Time  Frequency 
Distribution  based  on  a  kernel  with  compact  support. 
The  properties  of  the  new  distribution  are  emphasized. 
Through  a.  parameter  that  controls  the  kernel  width, 
the  new  representation  allows  a  tradeoff  between  a  good 
autoterm  resolution  and  a  high  crossterm  rejection. 
A  signal  representation  example  is  provided  and  com¬ 
pared  with  respect  to  the  Wigner  distribution,  the  Choi- 
Williams  distribution,  the  spectrogram  and  the  Born- 
Jordan  distribution  as  a  member  of  the  Reduced  Inter¬ 
ference  Distribution  class. 

1.  INTRODUCTION 

Time  Frequency  distributions  are  more  and  more  widely 
used  for  nonstationary  signal  analysis.  They  perform  a 
mapping  of  one-dimensional  signal  x(t)  into  a  two  di¬ 
mensional  function  of  time  and  frequency  TFDx{t,  f). 
Herein,  we  are  interested  by  the  Cohen’s  Class  Distri¬ 
butions.  The  general  expression  of  a  member  of  this 
class  is  given  by  [1], 

TFDx(t,f)  =  J  j  J<Hr,,T)x(i'  +  l)xH(i'-^) 

e-j2*Vte-j2«Tfej2  *r,t‘dt,dTdj]  (1) 

where  t  and  /  represent  time  and  frequency,  respec¬ 
tively,  and  H  the  transpose  conjugate  operator.  The 
kernel  <f>(r),  r)  determines  the  main  properties  of  the  re¬ 
sulting  Time  Frequency  Distribution  (TFD).  Many  au¬ 
thors  [2,  1]  start  from  the  Cohen’s  class  of  distributions 
to  define  kernels  whose  main  property  is  to  reduce  the 
interference  patterns  induced  by  the  distribution  itself. 


In  this  contribution  [3],  we  propose  to  use  a  new  kernel 
derived  from  the  Gaussian  kernel  [4].  Unlike  the  Gaus¬ 
sian  kernel,  the  new  kernel  has  the  compact  support  an¬ 
alytical  property,  i.e.,  it  vanishes  itself  outside  a  given 
compact  set.  Hence,  It  recovers  the  information  loss 
that  occurs  for  the  Gaussian  kernel  due  to  truncating 
and  improves  the  processing  time.  Moreover,  the  Com¬ 
pact  support  kernel  keeps  the  most  important  prop¬ 
erties  of  the  Gaussian  kernel.  This  compact  support 
property  is  different  from  the  finite  support  property 
of  time  frequency  representations.  It  turns  out  that 
through  a  control  parameter  the  new  time  frequency 
representation  allows  a  trade  of  between  a  good  auto 
term  resolution  and  a  high  cross  term  rejection.  In  the 
next  section,  the  expression  of  this  new  kernel  is  given 
together  with  the  properties  of  the  induced  time  fre¬ 
quency  distribution.  Finally  a  signal  representation  ex¬ 
ample  of  two  crossing  chirps  is  given  and  compared  with 
respect  to  the  Wigner  distribution  [1],  Choi-Williams 
distribution  [2],  the  spectrogram  and  the  Born-Jordan 
distribution  as  a  member  of  the  Reduced  Interference 
Distribution  class  [5]. 

2.  THE  NEW  KERNEL 

The  new  kernel  is  derived  from  the  Gaussian  kernel  by 
transforming  the  ft'  space  into  a  unit  ball  through  a 
change  of  variables.  This  transformation  packs  all  the 
information  in  the  unit  ball.  With  the  new  variables, 
the  Gaussian  is  defined  on  the  unit  ball  and  vanishes 
on  the  unit  sphere.  Then,  it  is  extended  over  all  the  111 2 
space  by  taking  zero  values  outside  the  unit  ball.  The 
obtained  kernel  still  belongs  to  the  space  of  functions 
with  derivatives  of  any  order.  The  new  kernel  refereed 
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to  as  Compact.  Support  Kernel  (C'SK)  has  the  following 
expression  [4], 


H'h  T ) 


i(- 


f  -  .p  +  r-'-l 

0 


+7) 


if  +  T2  <  1 

elsewhere 


(2) 


where  j  is  a  parameter  that  controls  the  kernel  width. 
Figure  1  shows  the  Compact  Support  Kernel  (C'SK) 
with  7  =  5.5. 


2.1.  Features  of  the  new  kernel 

Let  us  first  recall  two  practical  limitations  of  the  Gaus¬ 
sian  kernel:  information  loss  clue  to  diminished  accu¬ 
racy  when  the  Gaussian  is  cut  off  to  compute  the  time 
frequency  distribution,  and  the  prohibitive  processing 
time  due  to  the  mask’s  width  which  is  increased  to 
minimize  the  accuracy  loss.  The  main  features  of  the 
new  kernel  is  that  it  recover  the  above  information  loss 
and  improves  processing  time  and  retains  the  most  im¬ 
portant  properties  of  the  Gaussian  kernel  [4],  These 
features  are  achieved  thanks  to  the  compact,  support 
analytical  property  of  the  new  kernel.  This  compact 
support  property  means  that  the  kernel  vanishes  itself 
outside  a  given  compact  set. 

3.  THE  NEW  TIME  FREQUENCY 
DISTRIBUTION 

The  resultant  distribution  from  the  above  compact  sup¬ 
port  kernel  (2)  does  not  satisfy  the  marginal  property- 
just  like  the  spectrogram.  It  is  consistent  with  the  en¬ 
ergy  conservation  (</>(0,0)  =  1)  and  verifies  both  the 
reality  and  the  time  and  frequency  shift  properties. 

The  waveform  of  any  kernel  </)(?/,  r)  determines  the 
autoterm  resolution  and  the  cross  term  reduction  of  a 
time  frequency  distribution.  Note  that  there  is  a  trade¬ 
off  between  the  autoterm  resolution  and  the  interfer¬ 
ence  suppression  [5].  More  the  kernel  width  is  wide, 
more  the  resultant  distribution  suffers  from  interfer¬ 
ence  while  maintaining  good  auto  term  resolution.  On 
the  other  hand,  more  the  kernel  width  is  narrow,  better 
is  the  interference  term  suppression  at  the  expense  of 
the  autoterm  resolution. 

The  new  kernel  (2)  has  by  definition  a  limited  width 
extend  since  it  has  a  compact  support.  In  this  extend, 
its  width  can  be  controlled  through  the  parameter  7 
to  allow  a  tradeoff  between  a  good  autoterm  resolution 
and  a  sufficient  cross  term  suppression. 

Compared  to  the  Reduced  Interference  Distribution 
(RID),  the  resultant  distribution  from  the  new  kernel 
does  not  satisfy  all  the  distribution  properties  [5],  How¬ 
ever,  in  contrast  to  these  RIDs,  it  provides  a  good 


tradeoff  between  autoterm  resolution  and  cross  term 
suppression. 

Note  that  a  new  RID  kernel  can  be  derived  from  the 
kernel  of  (2)  following  the  design  procedure  proposed 

in  [5], 


4.  EXPERIMENTAL  RESULTS 

In  order  to  compare  the  performances  of  the  new  dis¬ 
tribution,  we  consider  Four  typical  time  frequency  dis¬ 
tributions  (Wigner  Distribution,  Choi- Williams  Distri¬ 
bution.  Spectrogram  and  Born-Jordan  Distribution  as 
member  of  the  RID  class  [5]).  In  this  section,  an  exam¬ 
ple  of  two  crossing  chirps  is  shown  which  clearly  reveals 
the  differences  in  performance  among  the  five  distribu¬ 
tions. 

In  figures  2,  3,  4.  5  and  6,  the  time  frequency  repre¬ 
sentation  of  the  two  crossing  chirps  is  plotted  using  the 
C'SK  TFD.  the  Wigner  TFD,  the  Choi-Williams  TFD, 
the  spectrogram  and  the  Born-Jordan  TFD  (as  a  mem¬ 
ber  of  the  RID  class),  respectively.  The  new  time  fre¬ 
quency  representation  shows  its  ability  to  remove  the 
cross  terms  and  presents  cute  curves  in  contrast  to  the 
other  representations. 


Compact  Support  Kernel  (CSKi 


Figure  1:  The  Compact  Support  Kernel. 
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ABSTRACT 


2.  A  WEIGHTED  DECOMPOSITION  OF  THE  WD 


The  Wigner  distribution  (WD)  can  be  decomposed  into  a  linear 
combination  of  elementary  WDs.  Slow-oscillatory  elementary 
WDs  and  fast-oscillatory  elementary  WDs  mainly  contribute  to 
auto-terms  and  cross-terms,  respectively.  Using  a  weight 
function  to  keep  slow-oscillatory  elementary  WDs  and  attenuate 
fast-oscillatory  elementary  WDs,  one  can  balance  auto-term 
resolution  and  cross-term  suppression  and  obtain  a  weighted 
Wigner  distribution  (WWD). 

1.  INTRODUCTION 

Time-frequency  representations  (TFRs)  describe  the  variation  of 
signals  simultaneously  in  time  and  frequency.  The  short-time 
Fourier  transform  (STFT)  [1,2],  a  typical  linear  TFR,  finds  wide 
use  in  practice.  Its  performance,  however,  is  restricted  by  the 
tradeoff  between  its  time  resolution  and  frequency  resolution. 
The  2-norm  of  the  STFT  is  known  as  the  spectrogram  [2], 

Quadratic  TFRs,  also  called  time-frequency  distributions,  are 
loosely  interpreted  as  two-dimensional  (2-D)  signal  energy 
densities  in  the  time-frequency  domain.  Cohen’s  class  of  shift 
covariant  TFRs  includes  all  quadratic  TFRs  that  satisfy  the  time- 
shift  and  frequency-shift  covariance  properties  [1,2],  The 
spectrogram  and  the  Wigner  distribution  (WD),  prominent 
members  of  Cohen’s  class,  can  be  viewed  as  “opposites.”  The 
WD  portrays  optimal  resolution  of  auto-terms  and  no  cross-term 
attenuation  whereas  the  spectrogram  portrays  poor  auto-term 
resolution  and  massive  reduction  of  cross-terms  [1,2].  The  cross¬ 
terms  in  the  WD  greatly  restrict  its  practical  use  [3]. 

In  the  WD,  cross-terms  oscillate  and  auto-terms  vary  slowly. 
Hence,  cross-terms  can  be  suppressed  by  convolving  the  WD 
with  a  2-D  low-pass,  fixed  kernel  (or  filter)  [1,2],  The  resulting 
TFR  is  a  member  of  Cohen’s  class  and  corresponds  to  a 
smoothed  WD  [2].  The  cost  of  attenuating  cross-terms,  however, 
usually  comes  at  the  expense  of  auto-term  resolution  [3], 

The  time-frequency  distribution  series  (TFDS)  of  [4] 
attenuates  cross-terms  differently.  The  WD  is  decomposed  into  a 
linear  combination  of  elementary  WDs.  Slow-oscillatory 
elementary  WDs  and  fast-oscillatory  elementary  WDs  mainly 
contribute  to  auto-terms  and  cross-terms,  respectively.  In  the 
TFDS,  auto-term  resolution  and  cross-term  suppression  is 
balanced  by  keeping  slow-oscillatory  elementary  WDs  and 
discarding  fast-oscillatory  elementary  WDs. 

Our  work,  inspired  by  the  TFDS  idea,  makes  progress  in:  (1) 
deriving  a  continuous-parameter  decomposition  of  the  WD;  (2) 
generalizing  the  weight  function  of  the  TFDS;  and  (3)  presenting 
an  approach  for  choosing  the  sampling  intervals  of  the 
parameters. 


The  STFT  of  a  signal  x(t)  is  defined  as1 

X(ll>(t,f)=  Jx(a)h*  (a-t)exp(-j27tfa)da  (1) 

where  h(t)  is  an  analysis  window  [2].  x(t)  can  be  recovered  from 
its  STFT  by 

x(t)=  |Jx<h)  (a, G)g(t-a)exp(j27tGt)dad0  (2) 

where  g(t)  is  a  synthesis  window  satisfying  [2] 

Jg (t) h  (t)dt  =  1 .  (3) 

The  cross-WD  of  two  signals  x(t)  and  y(t)  is  defined  by  [2] 
wx.y(b0=  Ix(t+f)y’(t-^jexp(-j2jtfT)dT.  (4) 

Taking  the  WD  of  (2),  we  get 

W  (t,f)  =  J JJJx(h>  (a,0)X(h)*  (p,t>)x 

xW.g  (t,f,a,0,p,v)docd0dpdo  (5) 

where  W^  (t,f,a,0,p,v)  is  the  cross-WD  of 

§(*)  =  g(t-a)exp(j27t0t)  (6) 

and  g(t)  =  g(t-P)exp(j27tvt).  (7) 

Eq.  (5)  shows  that  the  WD  can  be  decomposed  into  a  linear 


combination  of  elementary  WDs.  Approximating  the  integrations 
by  summations  in  (5),  we  obtain 

Wx(t,f)  =  T2F2£X££x<h,(nT,kF)X(b)'(pT,mF)x 

n  k  p  m 

xWig(t,f,nT,kF.pT.mF).  (8) 

Eq.  (8)  is  essentially  the  decomposition  of  the  WD  in  [4]. 
W.-(t,f,a,e,p,v)  can  be  expressed  as 


W|g(t,f,a,e,p,u)  =  W 


t  «+P  f  _£±Z| 

2  2  ) 


xexp{  j7t  [(6  +  n)  (a  +  P)]  +  j27t[(0-  u)  t  +  (P  -  a)  f  ]] .  (9) 

Eq.  (9)  shows  that  an  elementary  WD  is  halfway  between  the 
corresponding  elementary  signals,  has  the  same  envelope  as 
Wg(t,f),  and  oscillates  at  the  "frequency"  of  0-t>  in  time  and  at  the 
"frequency"  of  a-p  in  frequency. 

In  the  WD.  auto-terms  vary  slowly  and  cross-terms  oscillate. 
Hence,  in  (5)  slow-oscillatory  elementary  WDs  and  fast- 
oscillatory  elementary  WDs  mainly  contribute  to  auto-terms  and 
cross-terms,  respectively.  Using  a  weight  function  to  preserve 


1  Unless  otherwise  noted,  all  integrations  are  from  -°o  to  °°. 
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slow-oscillatory  elementary  WDs  and  attenuate  fast-oscillatory 
elementary  WDs  in  (5),  auto-term  resolution  and  cross-term 
suppression  can  be  balanced,  i.e., 

Ws  (t,f )  =  Jflfx"  (a,e)X(h)‘  (p,v)x 

xX(a,0,|3,u)W. .  (t,f,a,9,p,u)dad0d(3du  .  (10) 
We  call  Wx(t,f)  in  (10)  the  weighted  Wigner  distribution 
(WWD).  The  weight  function  3.(a,0,p,\))  is  generally  even  in 
both  a-p  and  0-u.  It  usually  tends  to  decrease  with  increasing 
|a-p|  and  |0-v|.  Slower-decreasing  3.(a,0,p.u)  usually 
means  higher  auto-term  resolution  and  lower  cross-term 
reduction.  The  WD  can  be  obtained  from  (10)  by  letting 
X(oc,0,p,i>)  =  1 . 

If  the  integrations  in  (10)  are  approximated  by  summations 
and  the  weight  function  is  chosen  as 

3.(n,k,p,m)  =  1  ,|n-p|  +  |k-m|  <  d  ,  (11) 

we  get 

W  (t,f)=T;F:XZ  Z  Zx<h)(nT-kF)X<h>'(PT’mF)x 

|n-p|t|k-m|  <i 

xW.  j,  (t,f,nT,kF,pT,mF) .  (12) 

Eq.  (12)  is  essentially  the  TFDS  with  order  d  of  [4], 

3.  THE  WWD  AND  COHEN’S  CLASS 

The  TFR  of  a  signal  x(t)  is  said  to  belong  to  Cohen's  fixed  kernel 
shift-covariant  class  if  and  only  if  the  TFR  is  a  2-D  filtered  WD, 
i.e.,  the  TFR  can  be  expressed  by 

T„(t,f)=  jj<p(t-t,f-f)Wx(t,f)dtdf  (13) 

where  <p(t,f)  is  a  fixed  (signal  independent)  kernel  [2],  Eq.  (13) 
can  be  equivalently  written  as  [2] 

T.(t.f)=  JJ*  (x,v)Ax  (x,v)exp[j27t(vt-fx)]dxdv  (14) 

where  the  kernel  function  T(x,v)  and  the  ambiguity  function 
Ax(x,V)  are  the  2-D  Fourier  transforms  (a  Fourier  transform  with 
respect  to  t  and  an  inverse  Fourier  transform  with  respect  to  f)  of 
tp(t,f)  and  Wx(t,f),  respectively. 

Substitution  of  (5)  into  (13)  yields 

T  (t,f )  =  J  j]Jx<h)  (a,0)X(l')'  (P,u)x 
x{jj<p(t-t,f -f)W. .  (t,f,a,0,p,v)dtdf]dad0dpd\)  (15) 

where  {■}  denotes  a  filtered  elementary  WD.  Comparing  (10)  to 
(15),  we  see  that  the  WWDs  and  the  TFRs  of  Cohen’s  class  have 
similar  forms.  However,  the  WWD  of  (10)  results  from  weighted 
elementary  WDs  whereas  the  TFRs  given  by  (15)  result  from 
filtered  elementary  WDs. 

4.  WWD  ALGORITHM 

4.1.  Algorithm 

The  algorithm  for  the  WWD  (10)  proceeds  as  follows. 

(a)  Find  the  STFT  X(h'(t,f). 

(b)  Find  Wg(t,f). 

Wg(t,f)  is  used  in  the  computation  of  every  elementary  WD 
and,  thus,  its  computation  is  treated  as  a  separate  step.  Note 
that  if  g(t)  is  chosen  in  such  a  way  that  Wg(t,f)  has  a  closed- 
form  expression,  the  computation  can  be  carried  out  from  this 


expression  and.  consequently,  the  full  Nyquist  bandwidth  can 
be  obtained  without  over-sampling  [5], 

(c)  Find  W  (t,f).  For  each  pair  of  elementary  signals  whose 

weight  is  non-zero,  the  corresponding  pair  of  elementary 
WDs  is  computed  and  weighted.  All  the  weighted  elementary 

WDs  are  summed  to  obtain  W  (t,f  )  . 

4.2.  Sampling  of  the  STFT 

The  computation  of  the  STFT  involves  its  sampling  in  time  and 
in  frequency.  The  sampling  intervals  used  affect  the  precision 
and  the  speed  of  the  algorithm  considerably.  Smaller  sampling 
intervals  cause  higher  precision  but  slower  speed.  An  empirical 
method  for  choosing  the  sampling  intervals  is  given  in  [4,6], 
Here,  we  solve  the  problem  theoretically. 

The  discrete  version  of  W  (t,f )  can  be  obtained  by 
discretizing  (10).  Also,  it  can  be  derived  from  the  inverse 
sampled  STFT,  also  known  as  Gabor  expansion  [4,6,7],  The 
sampled  STFT  is  defined  as 

X<h'(nT,kF)  =  Jx(t)h*(t-nT)exp(-j2nkFt)dt  (16) 

where  T  and  F  are  the  sampling  intervals  of  time  and  frequency, 
respectively.  The  inverse  sampled  STFT  reconstructs  the  signal 
from  the  sampled  STFT,  i.e., 

x(t)=TFXXx(h’(nT,kF)g(t-nT)exp(j27ikFt).  (17) 

n  k 

Eq.  (17)  holds  if 

Jg(t)h^t-~ jexp^-j27t^ptjdt  =  8(m)6(n) .  (18) 

Taking  the  WD  of  (17),  we  get 

W  (t,f)=T!F:2ZZZx(l"(nT’kF)X<h1’  (pT.ntF)x 

n  k  p  m 

xWif(t,f,nT,kF,pT,mF).  (19) 

Using  a  weight  function  and  discretizing  t  and  f,  we  obtain  the 
discrete  version  of  the  WWD  of  (10).  i.e., 

W  (nT,kF)  = 

=  T :p2ZZZZX<h'  (nT.kF)X<M*  (pT.mF)x 

n  k  p  m 

xX  ( nT.  kF,  pT.  mF)  W.  f  (  nT ,  kFs ,  nT,  kF,  pT,  mF) .  (20) 

The  above  development  shows  that  if  h(t),  g(t),  T,  and  F  are 
chosen  such  that  (18)  is  satisfied,  (17)  and  thus  (20)  will  hold. 
Hence,  since  the  following  conditions  are  sufficient  for  the 
validity  of  ( 1 8),  they  can  be  used  as  the  criteria  for  choosing  h(t), 
g(t),  T,  and  F: 

A.  h(t)  and  g(t)  are  real  even  window  functions. 

B.  h(t)  and  g(t)  satisfy  (3). 

C.  If  Fh  and  Fg  denote  the  bandwidths  of  h(t)  and  g(t), 
respectively,  then 

T< — —  .  (21) 

F+F 

h  g 

D.  If  Th  and  Tg  are  the  width  of  h(t)  and  g(t),  respectively,  then 

F  <  — - —  .  (22) 

T.+T 

h  8 

Conditions  A  and  D  guarantee  that  (18)  holds  when  m  ^  0. 
Conditions  A  and  C  guarantee  that  ( 1 8)  holds  when  m  =  0  and 
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n  *  0  .  When  m  =  0  and  n  ^=0,  (18)  becomes 

Jg(t)h(t)exp^-j2re-^tjdt=0,  (23) 

which  is  equivalent  to 

JG^-f]H(f)df=°-  (24) 

Eq.  (24)  holds  if  conditions  A  and  C  are  satisfied.  Condition  B 
guarantees  that  (18)  holds  when  m  =  n  =  0. 


5.  EXPERIMENTAL  RESULTS 


Using  (10),  a  WWD  is  given  below.  We  call  this  WWD  the 
Gauss-Hamming  distribution  (GHD).  We  choose 


h(t)  =  g(t)=  j  — 7  exp 
ref 


f  t^ 


V  lo  J 


(25) 


Note  that  h(t)  and  g(t)  satisfy  conditions  A  and  B  of  section  4 
above.  Taking  the  Fourier  transform  of  (25),  we  get 

H(f)  =  G(f)  =  t^jtt[exp(-jt:tJf-) .  (26) 

Defining  the  width  of  the  function  exp(-x2/x02)  as  4x0,  we  obtain 
Th=T=4t0  (27) 

and  Fh=F=  — .  (28) 

7Ct0 

Substituting  (28)  and  (27)  into  conditions  C  and  D  of  section  4, 
we  obtain 

1 


ret. 

T  < — -  and 

4 


F  <- 


4t. 


respectively.  Wg(t,  f)  has  the  closed-form  expression 
W  (t,f )  =  2exp 


f  2t!  ^ 

-4-+2re-cf: 


L  V  lo 


yj 


(29) 


(30) 


The  weight  function  is  chosen  as 
X  (a,  0,  p,  v)  =  0.54  +  0.46  cos 


(tx-p)2  |  (9-\)); 


(q-p)2 1  (e--o)2  r] 


(31) 


a"  b' 

The  effect  of  the  weight  function  (31)  on  a  signal  composed 
of  two  Gaussian  components  is  shown2  in  Fig.  1.  A  faster 
decreasing  A. (a,  0,p,u) ,  i.e.,  smaller  a  and  b  in  (31),  results  in 
poorer  auto-term  resolution  and  more  cross-term  suppression 
(see  Fig.  1(a)).  The  reverse  situation  is  shown  in  Fig.  1(b). 

The  kernel  4<(t,v)  in  (14)  uniquely  specifies  a  given  Cohen's 
class  TFR.  For  example,  we  used  a  Hamming  window  to  obtain 

r  n — 

V  T 

nj— +7T  •  (32) 

V  Va-  b-J 

We  call  the  resulting  TFR  the  Hamming  distribution  (HD). 

We  first  consider  a  simulation  involving  a  128-point  signal 


>P(t,v)  =0.54  +  0.46  cos 


2  All  TFRs  are  displayed  graphically  by  employing  7  linearly 
spaced  contours.  All  contour  plots  have  time  running 
horizontally  and  increasing  to  the  right;  frequency  runs  vertically 
and  increases  to  the  top. 


composed  of  three  Hamming-windowed  parallel  complex  linear 
chirps.  The  results  are  shown  in  Fig.  2.  The  WD  is  shown  in  Fig. 
2(a)  and  a  spectrogram  using  a  Hamming  window  is  given  in 
Fig.  2(b).  Fig.  2(c)  shows  the  Hamming  distribution  (HD)  and 
the  GHD  is  given  in  Fig.  2(d).  Note  that  the  HD  and  the  GHD 
portray  better  auto-term  resolution  than  the  spectrogram  and  no 
visible  cross-terms. 


Fig.  1.  GHD  contour  plots  for  the  simulation  involving  two 
Gaussian  signals,  (a)  Faster  decreasing  weight  function;  and  (b) 
slower  decreasing  weight  function. 


(a)  (b) 


Fig.  2.  Contour  plots  for  the  simulation  involving  three  parallel 
complex  linear  chirps,  (a)  WD;  (b)  spectrogram;  (c)  HD;  and  (d) 
GHD. 


A  simulation  involving  a  256-point  signal  consisting  of  a 
complex  sinusoid,  a  linear  chipp,  and  a  hyperbolic  chirp  is 
depicted  in  Fig.  3.  Fig.  3(a)  displays  the  WD  of  the  composite 
signal  whereas  Fig.  3(b)  shows  the  spectrogram  when  a  32-point 
Hamming  window  is  used.  The  spectrogram  exhibits  essentially 
no  cross  terms  and  a  major  resolution  loss  in  the  auto-terms.  On 
the  other  hand,  the  WD  exhibits  perfect  resolution  in  the  auto¬ 
terms  and  the  presence  of  all  cross-terms.  The  HD  and  the  GHD 
are  given  in  Figs.  3(c)  and  3(d),  respectively.  The  HD  and  the 
GHD  exhibit  similar  performance.  In  particular,  both  preserve 
auto-term  resolution  better  than  the  spectrogram  and  exhibit 
fewer  and  much  reduced  cross-terms  than  the  WD. 

Fig.  4  shows  the  simulation  involving  a  128-point  signal 
composed  of  one  Gaussian  component,  one  linear  chirp,  and  one 
component  having  parabolic  instantaneous  frequency.  Fig.  4(a) 
displays  the  WD  with  its  undesirable  inner  and  outer  cross-terms. 
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Fig.  4(b)  displays  a  spectrogram  with  its  undesirable  low- 
resolution  auto-terms.  As  Figs.  4(c)-(d)  show,  both  the  HD  and 
the  GHD  are  good  compromises  between  the  WD  and  the 
spectrogram. 


Fig.  3.  Contour  plots  for  the  simulation  involving  a  signal 
composed  of  one  complex  sinusoid,  one  linear  chirp,  and  one 
hyperbolic  chirp,  (a)  WD;  (b)  spectrogram;  (c)  HD:  and  (d) 
GHD. 


Fig.  4.  Contour  plots  for  the  simulation  involving  a  signal 
composed  of  one  Gaussian  component,  one  linear  chirp,  and  one 
parabolic  chirp,  (a)  WD;  (b)  spectrogram;  (c)  HD;  and  (d)  GHD. 


Fig.  5  shows  the  simulation  involving  a  bat  chirp  signal 
emitted  by  the  Large  Brown  Bat  (Eptesicus  Fuscus)3.  Both  the 
HD  and  the  GHD  in  Figs.  5(c)-(d)  exhibit  much  better  auto-term 
resolution  than  the  spectrogram  in  Fig.  5(b)  and  much  less  cross- 
terms  than  the  WD  in  Fig.  5(a). 


3  The  authors  wish  to  thank  Curtis  Condon,  Ken  White,  and  A1 
Feng  of  the  Beckman  Institute  of  the  University  of  Illinois  for  the 
bat  data  and  for  permission  to  use  it  in  this  paper. 


(a)  (b) 


Fig.  5.  Contour  plots  for  the  simulation  involving  a  bat 
echolocation  pulse,  (a)  WD;  (b)  spectrogram;  (c)  HD;  and  (d) 
GHD. 


6.  SUMMARY 

The  WD  can  be  decomposed  into  a  linear  combination  of 
elementary  WDs.  Slow-oscillatory  elementary  WDs  and  fast- 
oscillatory  elementary  WDs  mainly  contribute  to  auto-terms  and 
cross-terms,  respectively.  Using  a  weight  function  to  keep  slow- 
oscillatory  elementary  WDs  and  attenuate  fast-oscillatory 
elementary  WDs,  auto-term  resolution  and  cross-term 
suppression  can  be  effectively  balanced. 
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ABSTRACT 

In  this  paper,  we  demonstrate  the  use  of  support  vector  regres¬ 
sion  (SVR)  techniques  for  black-box  system  identification.  These 
methods  derive  from  statistical  learning  theory,  and  are  of  great 
theoretical  and  practical  interest.  We  briefly  describe  the  theory 
underpinning  SVR,  and  compare  support  vector  methods  with  other 
approaches  using  radial  basis  networks.  Finally,  we  apply  SVR  to 
modeling  the  behaviour  of  a  hydraulic  robot  arm,  and  show  that 
SVR  improves  on  previously  published  results. 

1.  INTRODUCTION 

System  identification  of  nonlinear  black-box  models  is  a  crucial 
but  complex  problem.  There  have  been  numerous  recent  papers 
in  the  area  based  on  neural  networks,  wavelet  networks,  hing¬ 
ing  hyperplanes,  etc.  Roughly  speaking,  one  selects  a  set  of  re¬ 
gressors/basis  functions,  and  tries  to  determine  the  number  of  ba¬ 
sis/regressors  and  their  parameters  according  to  a  given  statisti¬ 
cal  criterion.  Many  methods  are  based  on  a  penalised  maximum 
likelihood  criterion.  Performing  model  selection  and  estimation 
is  usually  a  difficult  task,  however,  as  it  involves  solving  complex 
integration  and/or  optimisation  problems.  Gradient  methods  are 
often  used,  but  are  only  guaranteed  to  converge  toward  local  op¬ 
tima.  Recently,  in  a  Bayesian  framework,  Markov  chain  Monte 
Carlo  algorithms  have  also  been  developed.  These  methods  are 
computationally  intensive,  however. 

We  propose  here  an  alternative  approach  based  on  support  vec¬ 
tor  machines.  These  comprise  a  set  of  powerful  tools  to  perform 
classification  and  regression  [8],  and  have  become  very  popular 
recently  in  the  machine  learning  community.  This  approach,  mo¬ 
tivated  by  Statistical  Learning  Theory  [10],  is  systematic  and  prin¬ 
cipled.  One  can  list  its  main  advantages: 

•  There  are  very  few  free  parameters  to  adjust. 

•  Estimating  the  unknown  parameters  only  involves  optimi¬ 
sation  of  a  convex  cost  function.  This  can  be  achieved  using 


standard  quadratic  programming  algorithms.  This  is  fast 
and  there  are  no  local  minima. 

•  The  model  constructed  depends  explicitly  on  the  most  “in¬ 
formative”  data  (the  support  vectors). 

•  It  is  possible  to  obtain  theoretical  bounds  on  the  generalisa¬ 
tion  error  and  the  sparseness  of  the  solution  (see  [8]).  These 
bounds  are  independent  of  the  distribution  generating  the 
training  and  test  data. 

To  the  best  of  our  knowledge,  support  vector  regression  (SVR) 
has  never  been  used  in  the  context  of  system  identification,  al¬ 
though  it  has  been  used  in  estimating  time  series  by  Muller  et  al. 
[4],  and  Mattera  and  Haykin  [3].  This  work  differs  from  these  pre¬ 
vious  studies  in  that  it  investigates  the  v-SVR  method  [5],  which 
does  not  require  us  to  specify  an  a  priori  level  of  accuracy.  We 
demonstrate  the  application  of  this  algorithm  to  modeling  a  stan¬ 
dard  data  set,  and  show  that  it  is  possible  to  obtain  results  that  im¬ 
prove  on  current  state-of-the-art  methods  [6],  [7],  with  very  little 
tuning. 

2.  BLACK-BOX  SYSTEM  IDENTIFICATION 

The  problem  of  nonlinear  black-box  system  identification  consists 
of  conducting  non-parametric  regression,  as  described  in  Sjoberg 
et  al.  [6],  [7],  among  others.  This  means  that  random  variables 
(x,  y),  which  take  values  in  X  x  y,  are  generated  according  to  a 
distribution  Px,y,  and  we  are  required  to  estimate  the  regression 
function  y  on  x,  or 

Ey(y|x  =  x)  =  /(x). 

We  call  x  the  regressor,  and  y  the  output.  We  further  define  X  =Rrf 
and  y  =R.  We  want  to  estimate  /  (•)  from  the  training  sample 

ZN  =  ((xi,t/i),...,(xAr,t/jv))  G  (X  X  y)N  , 
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v-SVR  performance  :  training  set 


each  element  of  which  is  drawn  from  Px,y.  Since  we  do  not  know 
the  mapping  /(•),  we  define  a  learning  algorithm  A,  which  gives 
us  an  estimate  /*(•)  of  /  (•). 

OO 

A:  (J  (A'  x  >')A'  -*  H 

N~  1 

z  f*  (•)  > 


within  a  class  %  C  yx  (here  y*  refers  to  the  set  of  functions 
mapping  X  to  y\  called  the  hypothesis  space ,  which  is  flexible 
enough  to  model  a  wide  range  of  functions.  An  estimate  /*(•) 
associated  with  the  loss  c  (x,  y,  fx  (•))  is  attained  by  minimising 
the  risk, 

/*(•)  =  argmin  \r  (gz  (■))  =  Ex.y  [c  (x,  y,  gz  (x))]l  .  (1) 

s.(-)e«  L 

Possible  loss  functions  include  quadratic  loss, 

c{x,y,g*{-))  =  I y-9z  (x)|2, 

Vapnik’s  e-insensitive  loss  [10], 

c(x,t/,.9*  (•))  =  max{0,  |p,(x)  ~y |  -  e}, 
and  Huber  loss, 

c  (x,  y,  Qz  (•))  = 

fe|p*(x)  -  2/|  -  y  for  Mx)  -  y\  >  e, 

\  |l.9*(x)  ~  y\2  otherwise. 


among  others. 

In  practice,  the  regression  function  /*  (•)  cannot  readily  be  ob¬ 
tained  from  equation  (1),  since  we  do  not  usually  know  the  distri¬ 
bution  Px,y.  Minimising  the  empirical  risk  alone  does  not  take  into 
account  other  requirements  that  we  would  like  to  satisfy,  such  as 
smoothness,  and  can  therefore  result  in  overfitting  [8],  [10]. 

Classes  of  system  identification  problems  falling  within  the 
nonlinear  black-box  identification  framework  are  described  in  [6], 
[7],  These  include  nonlinear  finite  impulse  response  models,  non¬ 
linear  autoregressive  models  with  external  input,  nonlinear  output 
error  models,  nonlinear  autoregressive  moving  average,  nonlinear 
Box-Jenkins  models,  etc. 

3.  SUPPORT  VECTOR  REGRESSION 

We  now  describe  how  support  vector  machines  may  be  used  to 
solve  the  system  identification  problem  described  in  the  previous 
section.  The  results  in  this  section  are  derived  in  Scholkopf  et  al. 
[5]. 

To  describe  the  I'-SVR  procedure,  we  must  first  define  a  map¬ 
ping  from  the  space  X  of  regressors  to  the  possibly  infinite  di¬ 
mensional  hypothesis  space  Ti,  in  which  an  inner  product  (•,  ■)•« 
is  defined.  We  formally  describe  this  map  as 

$  :  *  ->  n 

x  i  y  3>(x). 

We  choose  to  limit  our  choice  of  regression  function  /z(-)  to  the 
class  of  functions  which  can  be  expressed  as  inner  products  in  H, 


Fig.  1.  Robot  arm  data  and  /v-SVR  model :  Training  set  and  model 
approximation. 


taken  between  some  weight  vector  w  and  the  mapped  regressor 

*(x): 

/z(x)  =  (w,#(x))«  +  h.  (2) 

The  regression  function  in  the  hypothesis  space  is  consequently 
linear,  and  thus  the  nonlinear  regression  problem  of  estimating 
f z  (x)  has  become  a  linear  regression  problem  in  the  hypothesis 
space  U .  Note  that  the  mapping  <h(  )  need  never  be  computed  ex¬ 
plicitly;  instead,  we  use  the  fact  that  if  H  is  the  reproducing  kernel 
Hilbert  space  induced  by  k( •,  ■),  then  writing  $(x)  =  fc(x,  •),  we 
get 

($(x,).$(x;)>«  =  fc(x,.Xj). 

The  latter  requirement  is  met  for  kernels  fulfilling  the  Mercer  con¬ 
ditions  [8],  These  conditions  are  satisfied  for  a  wide  range  of  ker¬ 
nels,  including  Gaussian  radial  basis  functions  (see  equation  (6)). 
We  emphasise  that  the  feature  space  need  never  be  defined  explic¬ 
itly,  since  only  the  kernel  is  used  in  SVR  algorithms.  Indeed,  it 
is  possible  for  multiple  feature  spaces  to  be  induced  by  a  single 
kernel. 

We  now  describe  the  optimisation  problem  to  be  undertaken  in 
finding  /*(•).  All  support  vector  regression  methods  involve  the 
minimisation  of  a  regularised  risk  functional,  which  represents  a 
tradeoff  between  smoothness  and  training  error  (the  latter  is  deter¬ 
mined  by  the  cost  functional).  In  the  case  of  the  /v-SVR  method, 
the  regularised  risk  z)  at  the  optimum  is  given  by 


min  [Kgifz {■),*)}  = 

w  ,b,c 


min 

w  ,6,e 


\\M2n+C(ve  +  Rlmp(U),z)) 


(3) 


where  we  use  the  Vapnik  e-insensitive  loss  in  the  empirical  risk; 


N  N 

Kmp  (/*(•).*)  =  ^C(X,,1/;  ,/*(•))  =  —  ^  &  +£,*  , 


in  which 

k,  =  max{0,  f z  (x)  -  y  -  e}  and 
Ci  =max{0,  -/*(x)  +  j/-e}. 
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v-SVR  performance  :  validation  set 


Fig.  2.  Robot  arm  data  and  t'-SVR  model  :  Validation  set  and 
model  approximation. 


All  training  points  (x; ,  y, )  for  which  |/2(x,)  -  y,\  >  e  are  known 
as  support  vectors',  it  is  only  these  points  that  determine  /*(•). 
Note  that  other  loss  functions,  such  as  the  Huber  loss,  can  also  be 
used  in  support  vector  regression,  although  not  all  loss  functions 
result  in  a  sparse  representation.  The  terms  C  and  v  in  equation 
(3)  specify  the  tradeoff  between  model  simplicity,  the  size  of  the 
parameter  e  below  which  the  loss  is  zero,  and  the  total  empirical 
loss  over  the  training  set,  Rremp(-).  Scholkopf  et  al.  [5]  describe 
the  theoretical  behaviour  of  v  and  C  in  more  detail. 

It  can  be  shown  [5]  that  the  component  w  in  equation  (2)  is  a 
linear  combination  of  the  mapped  training  points, 

N 

w  =  -  Qf)$(x0>  (4) 

i— 1 

and  that  solving  equation  (3)  is  equivalent  to  finding 


v-SVR  performance  :  c2  varies 


Fig.  3.  RMS  error  variation  with  u,  a2  and  C.  In  each  case,  the 
fixed  parameters  take  their  optimal  values. 


e.g.  Chang  et  al.  [2].  The  offset  b  is  found  using 

(w, &(xj))n  +  b  -  y,  =  e  when  aj  6(0,— 

\  771 

Vi  ~  (w,  $(xj))w  -  b~e  when  a *  G  fo,  — 


max 

Ct,Qt* 


1 

2 


N 

E(*  -  aj)k(xi,Xj)+ 

i,j= 1 


N 

Y^yi((**i  - (*i),  (5) 

i—  1 


the  set  of  equations  thus  obtained  can  be  solved  via  linear  least 
squares. 

4.  COMPARISON  WITH  STANDARD  RBF  APPROACHES 

A  popular  set  of  regression  functions  are  the  radial  basis  functions. 
The  radial  basis  function  expansion  is 


subject  to 


N 

^(Qi  -  Q*)  =  0, 
i~  1 


otiiCXi  £ 


0, 


N 

y>, + q*  )  <  Cv. 


There  exist  a  number  of  methods  that  can  be  used  to  solve  this 
quadratic  programming  problem.  Our  results  were  obtained  using 
the  LOQO  algorithm  in  Vanderbei  [9].  In  the  case  of  large  training 
sets,  data  decomposition  methods  exist  to  speed  convergence;  see 


M 

/*(x)  = 

i—O 

where  M  is  the  number  of  radial  basis  functions  used  (this  need 
not  be  the  same  as  the  number  N  of  training  points),  Wj  G  R 
scale  the  various  basis  functions  -),wo  scales  the  constant 

offset  term  ko{p0,  ■)  =  1,  and  each  basis  function  ■)  has 

an  individual  centre  parameter  and  width  parameter  o3 .  For 
instance,  in  the  case  of  Gaussian  radial  basis  networks, 

, ,  ,  ( iix-^m 

kj  (Mj ,  x)  =  exp  I - 2(7  ■  . j  '  (6) 
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It  is  also  possible  to  use  a  more  general  covariance  matrix,  rather 
than  ay,  this  results  in  a  greater  number  of  parameters  that  require 
adjustment. 

It  is  clear  that  SVR  methods  in  fact  produce  radial  basis  func¬ 
tion  networks,  with  all  width  parameters  a2j  set  at  the  same  value, 
and  centres  /x;  corresponding  to  support  vectors  Xj  (thus  M  is 
the  number  of  support  vectors).  As  discussed  previously,  the  SVR 
training  procedure  selects  the  training  points  to  be  used  in  this  ex¬ 
pansion  so  as  to  avoid  overfitting,  and  to  achieve  sparseness  with 
regards  to  the  training  data.  Furthermore,  the  attendant  optimisa¬ 
tion  process  is  convex,  and  has  a  single  optimum. 

There  is  a  great  deal  of  literature,  past  and  present,  on  meth¬ 
ods  for  training  radial  basis  function  networks;  see  for  instance 
Bishop  [1].  Without  going  into  detail,  it  is  fairly  common  prac¬ 
tice  to  centre  the  basis  functions  on  the  training  data  and  fix  the 
basis  width  a  priori,  as  in  SVR.  Model  selection  (determining  the 
number  of  non-null  weight  vector  components  w,)  and  parame¬ 
ter  estimation  (estimating  the  values  of  the  w,  )  in  traditional  ra¬ 
dial  basis  function  network  methods,  however,  are  usually  based 
on  Bayesian/penalized  maximum  likelihood  approaches;  the  as¬ 
sociated  optimisation  problems  are  often  non-convex  and  possess 
multiple  local  minima,  which  can  lead  to  greater  computational 
complexity. 

5.  EXPERIMENTAL  RESULTS 

In  the  following  experiments,  we  make  use  of  a  Gaussian  radial  ba¬ 
sis  function  kernel,  as  described  in  equation  (6),  with  kernel  width 
a2  (note  that  other  kernel  options,  such  as  polynomial  kernels  or 
sigmoid  kernels,  could  also  be  used).  As  we  perform  SV  regres¬ 
sion,  the  kernel  centres  are  set  at  the  training  point  locations  x, . 
We  apply  the  ;/-SVR  algorithm  to  modeling  behaviour  of  a  hy¬ 
draulic  robot  arm;  our  result  will  be  compared  with  the  neural  net¬ 
work  NARX  and  wavelet  network  NARX  models  in  Sjoberg  et  al 

[6].  The  input  m  represents  the  size  of  the  valve  through  which 
oil  flows  into  the  actuator,  and  the  output  yt  is  a  measure  of  oil 
pressure  (the  latter  determines  the  arm  position).  For  the  purpose 
of  comparison,  we  used  the  regressor 

x(  =  [yt-i  yt- 2  t/f-3  ut- 1  u(-2]t, 

since  this  is  also  used  by  Sjoberg  et  al.  We  also  used  half  the 
data  set  for  training,  and  half  as  validation  data,  again  following 
the  procedure  of  Sjoberg  et  al.  The  kernel  width  was  set  at  a2  = 
1.2242,  and  we  used  the  zx-SVR  parameters  v  =  0.2444  and  C  = 
4.07  x  103.  It  must  be  emphasised  that  the  experimental  outcome 
varies  little  for  a  wide  range  of  parameter  values;  see  figure  3.  Note 
also  that  prior  knowledge  of  the  observation  noise  would  allow  us 
to  select  a  value  of  v  that  is  asymptotically  optimal  in  the  number 
of  data  [8], 

The  zz-SVR  model  output  on  the  training  data  is  given  in  fig¬ 
ure  1,  and  the  model  output  on  the  validation  data  in  figure  2.  The 
RMS  error  of  this  prediction  on  the  validation  set  is  0.280,  which 
is  lower  than  both  the  wavelet  network  RMS  error  (0.579),  and 
the  prediction  made  by  a  one-hidden-layer  sigmoid  neural  network 
with  ten  hidden  units  (0.467).  Although  Sjoberg  et  al.  were  able  to 
further  reduce  the  RMS  error  to  0.328  on  this  data  set,  this  required 
assumptions  regarding  the  model  structure  not  made  in  our  algo¬ 
rithm.  Further  advantages  of  the  zx-SVR  solution  include  simplic¬ 
ity,  computational  efficiency,  robustness  in  the  face  of  decreased 
training  set  size,  and  ease  of  tuning,  due  to  the  low  sensitivity  of 


the  solution  to  changes  in  a2,  v  and  C.  Our  implementation  of  the 
zz-SVR  algorithm  required  56  lines  of  Matlab  code  (excluding  the 
standard  quadratic  programming  component),  and  took  193  sec¬ 
onds  to  train  on  the  data  set  in  figure  1,  using  a  Pentium  III  proces¬ 
sor  running  at  500MHz. 

6.  CONCLUSION 

In  this  study,  we  describe  the  important  theoretical  and  practical 
advantages  of  support  vector  regression  for  back  box  system  iden¬ 
tification.  The  simplicity  of  implementation,  coupled  with  good 
performance  in  both  this  and  other  studies  on  time  series  predic¬ 
tion.  make  SVR  methods  an  attractive  alternative  to  standard  sys¬ 
tem  identification  techniques. 
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ABSTRACT 

Model  selection  and  system  identification  for  cases  where 
the  model  is  required  to  have  both  characteristics  of  time- 
variance  and  nonlinearity  is  considered.  To  enable  identi¬ 
fication  from  a  single  input/output  observation  record,  the 
time-variation  is  approximated  by  a  weighted  sum  of  or¬ 
thogonal  sequences.  Wavelet  packets  are  chosen  for  these 
sequences  and  an  adapted  basis  for  each  time-varying  coef¬ 
ficient  is  selected  via  the  Best  Basis  algorithm  [1],  Individ¬ 
ual  wavelet  packets  are  then  selected  via  a  multiple  hypoth¬ 
esis  test  which  determines  those  packets  that  are  significant 
to  each  approximation,  and  which  may  be  discarded  from 
the  model. 

1.  IDENTIFICATION  OF  TIME- VARYING 
NONLINEAR  SYSTEMS 

All  physical  systems  exhibit  some  degree  of  time-varying 
nonlinear  behavior.  Despite  the  linear  time-invariant  model 
being  adequate  for  many  systems,  there  is  a  growing  re¬ 
quirement  for  more  accurate  models  to  achieve  higher  per¬ 
formance.  Of  particular  interest  at  present  is  communica¬ 
tions  where  channel  characterisation  is  required  for  such 
tasks  as  analysis  and  equalisation.  In  mobile  communica¬ 
tions,  multipath  propagation  and  user  motion  result  in  a  dis¬ 
persive  time- varying  communication  channel  [2].  Nonlin¬ 
ear  communication  channels  are  also  widely  studied  [3,  4, 
5],  typically  modeled  by  a  Volterra  model. 

A  time- varying  quadratic  Volterra  model  with  memory 
M  may  be  written  as 

M-l 

y(n)  =  hi(n,mi)x(n-m1)  + 

m  i=0 

M-l  M-l 

E  E  hi{n,mi,m2)x{n  —  mi)x(n  —  m2) 

m\  — 0  m2=0 

for  the  observed  time  record  ofn  =  0,1,...,  TV  —  1.  The 
time- varying  kernels  hi(n,m\)  and  h^in,  mj,  m2)  respec¬ 
tively  represent  the  linear  and  quadratic  time-varying  dy¬ 
namics  of  the  system.  Taking  into  account  the  symmetry 


in  arguments  of  the  quadratic  kernel  (i.e.  h2(tb  mi, m2)  = 
Ii2{n,m2,mi)),  the  model  contains  (M2  +  3M)/2  time- 
varying  coefficients  and  thus  N(M'2  +  3M)/2  parameters. 
To  enable  their  estimation  from  a  single  input/output  record, 
a  subset  of  sequences  from  an  orthogonal  basis  may  be  used 
to  approximate  the  time-variation.  The  basis  from  which 
the  sequences  are  taken  may  be  determined  using  a  pri¬ 
ori  knowledge  of  the  characteristics  of  the  system’s  time- 
variation,  or  from  a  general  basis  when  this  information  is 
lacking.  An  approximation  for  the  linear  kernel  may  be 
written  as 

hi(n,mi )  E  Pisfp(mi)ipsfp{n) 

sfp&Qmi 

with  a  similar  expression  for  h2  (n,  mi ,  mo).  The  summa¬ 
tion  is  over  Qm, ,  a  set  of  Q  triples  {sfp}  indexing  the 
coefficients  fiisfP (mi),  and  their  corresponding  orthogonal 
wavelet  packets  tpsfp(n).  For  example,  say 

hi(n,0)  «/3i2oo(0)^2oo(rc)  +  /?mo (0)^210 (n) 

+  Pino(0)ipuo(n)  +  Pun  (0  )ipm(n), 

then  Q0  =  {{200},  {210},  {110},  {111}}, for  Q  =  4. 

Since  all  ipsfp(n)  are  known,  we  need  only  estimate  the 
time-invariant  kernels  Pisfp(mi)  and  /?2i/p(wti,m2).  If  Q 
wavelet  packets  are  used  for  each  approximation  there  are 
Q(M 2  +  3M)/2  unknown  time-invariant  coefficients  —  a 
great  reduction  in  unknowns  if  Q  <C  N.  Being  linear-in- 
the-parameters,  the  Volterra  model  may  be  written  as  a  lin¬ 
ear  regression  in  matrix  form 

y  —  X^b  +  e, 

where  y  is  a  vector  of  N  output  observations,  X,/,  is  a  matrix 
incorporating  the  input  observations  x(n)  with  the  wave¬ 
let  packets  tp sfp(n ),  b  is  a  vector  containing  coefficients 
from  Pisfp(mi)  and  /?2s/P(mi,m2)  for  sfp  £  Qml  and 
sfp  6  Qm2  respectively,  and  mi  =  0,1  -  1  and 

m2  =  mi , . . . ,  M  —  1.  The  error  term  e  denotes  noise  and 
modeling  mismatch.  The  least  squares  solution  is 

b  =  (XjX*)"1^  y. 
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2.  BEST  WAVELET  PACKET  BASIS 


Wavelets  have  been  advocated  for  their  flexibility  and  appli¬ 
cability  to  real-life  signal  characteristics  [6].  All  wavelets  in 
particular  basis  are  scaled  and  translated  versions  of  a  single 
analysing  wavelet 

vjk(t)  =  v'2i>(2jt  -  k) 

where  j,  k  £  Z.  The  analysing  wavelet  is  defined  in  terms 
of  a  scaling  function,  <t>{t), 

rP(t)  =  ^2(-l)kh1-kV2<l>(2t  -  k). 

k 

The  scaling  function  satisfies 

4>{t)  =  Y^hkV2m-k), 

k 

where  hk  =  s/2<p(2t.  -  k))  are  refinement  coeffi¬ 
cients.  The  set  of  functions  cj)(t)  and  j  =  0, 1. . . . , 

k  =  0, 1, . . . ,  2J  -  1,  forms  an  orthogonal  basis  in  L'2(W). 

Generalising  this  basis,  wavelet  packets  are  particular 
combinations  or  superpositions  of  wavelets.  Defining  the 
following  sequence  of  functions 

=  V2YjikM2t  -  k) 

k 

i>2r+  l{t)  =  v/2^PfrVv(2f  -  k ) 
k 

where  ip0(t)  -  d>(t)  and  1,6 1  (t.)  =  the  wavelet  packets 
are  dilated/translated  versions  of  these  functions: 

V>s/p(f)  =  2s/2ipf{2st-p) 

for  0  <  s  <  L,  0  <  /  <  2s,  0  <  p  <  2L~S  and  Q  =  2L. 
The  indices  s,  f  and  p  relate  to  scale,  frequency  and  po¬ 
sition,  respectively.  Using  these  indices  the  packets  may 
be  organized  into  a  rectangular  binary  tree  of  nodes  corre¬ 
sponding  to  packets  of  equivalent  sf,  as  illustrated  in  Fig¬ 
ure  1.  Each  node  (rectangle)  is  the  parent  of  the  two  nodes 
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Figure  1 :  Organisation  of  wavelet  packets  into  nodes  show¬ 
ing  the  indices  sfp. 

directly  below  it,  and  the  child  of  the  one  above.  A  wavelet 
packet  in  any  one  node  is  orthogonal  to  the  other  packets  in 


Figure  2:  Wavelet  packet  bases  used  in  identification. 

that  node  as  well  as  packets  in  other  nodes  with  which  its 
node  does  not  vertically  overlap. 

Orthogonality  properties  of  wavelet  packets  allows  the 
formation  of  many  bases.  Furthermore,  these  properties 
enable  the  wavelet  packet  bases  to  be  efficiently  searched 
through  to  find  the  most  appropriate  basis  for  use  in  approx¬ 
imating  a  signal.  The  Best  Basis  algorithm  [1]  is  used  for 
this,  determining  the  minimum  entropy  basis.  Minimising 
the  entropy  results  in  a  basis  whose  set  of  coefficients  has 
a  relatively  small  number  that  are  non-negligible  and  the 
sum  of  magnitudes  of  the  negligible  coefficients  is  negligi¬ 
ble.  Entropy  of  a  sequence  r  =  {c;  :  *  =  1,2,...,  N}  is 
defined  as 

hi2 

M( c)  =  -  whereP'  = 

The  algorithm  begins  by  calculating  the  entropies  for  the  co¬ 
efficients  of  the  wavelet  packets  in  each  node.  Then,  starting 
from  the  bottom  of  the  tree,  the  entropies  of  the  child  nodes 
are  compared  to  their  parents.  If  the  child  nodes  have  lower 
entropy,  they  replace  the  parent  node  and  become  a  child  of 
the  next  parent  node. 

To  find  the  best  basis  for  each  of  the  system's  time- 
varying  coefficients,  a  time-varying  model  is  identified  for 
several  bases  of  wavelet  packets  to  get  values  for  all  the  co¬ 
efficients.  These  bases  are  illustrated  in  Figure  2  for  Q  =  8. 
Four  distinct  bases  cover  the  entire  library  of  wavelet  pack¬ 
ets  which  consists  of  32  possible  bases.  Best  Basis  is  then 
run  on  both  set  of  coefficients  /3is/p(nri),  sfp  £  Qm,  and 
P-2sfp(™i,m.-2),  sfp  £  Qmim2,forrni  =  0, 1, . . . ,  M  -  1, 

7112  Now  that  a’se{  of  l)  sequences  has  been  chosen  for  each 
time-varying  coefficient,  it  is  desired  to  discard  those  se¬ 
quences  that  do  not  contribute  significantly  to  the  approx¬ 
imation.  This  may  be  done  several  ways  including  thresh¬ 
olding  or  by  hypothesis  testing. 

3.  MULTIPLE  HYPOTHESIS  TESTING 
FOR  BASIS  SEQUENCE  SELECTION 

To  test  the  significance  a  particular  wavelet  packet  has  on 
the  regression,  rewrite  the  regression  as 

y  =  xv,,/?i  +  X]b,  +  e 
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where  fa  is  the  coefficient  under  test,  x,/,;  is  its  correspond¬ 
ing  vector  from  and  X]  and  bj  is  the  rest  of  Xr  and 
b  from  the  regression  respectively.  There  are  Q(M2  + 
3M)/2  =  QP  coefficients  to  test  so  i  =  1,2,...,  QP.  The 
hypothesis 


Hi  :  ,6i  =  0,  bj  unspecified 
is  tested  against  the  two-sided  alternative 


Ki  0 


for  i  =  1,2,...,  QP.  Performing  an  a-level  test  on  this 
hypothesis,  if  we  reject  Hi,  the  probability  that  the  sequence 
is  significant  is  (1  —  a).  Strictly  speaking,  if  we  do  not 
reject  H \  we  cannot  say  whether  the  sequence  is  significant 
or  not.  However,  we  risk  not  rejecting  a  false  hypothesis 
and  remove  the  sequence  from  the  model. 

A  suitable  test  statistic  for  testing  Hi  measures  the  re¬ 
duction  in  the  residual  sum  of  squares  due  to  adding  the 
parameter  Bt  to  the  regression: 

Ti  =  (n  -  qp)  Hyz^.%ii2--iiy-x^ii2, 

l|y-x*b|p 

where  b j  =  (XTXj)-1XTy.  Under  the  null,  T,  is  F- 
distributed  with  degrees  of  freedom  1  and  N  -  QP.  The 
P-value,  obtained  from  T), 

Vi  =  Pr  {Fi,(n-qp)  >  Ti  |  if;  j  , 

determines  the  significance  of  the  sequence. 

Since  many  parameters  are  to  be  tested,  a  multiple  hy¬ 
pothesis  test  is  employed.  The  most  widely  applied  is  the 
Bonferroni  test  [7].  The  Bonferroni  test  is  simple  to  use  and 
enables  individual  hypotheses  to  be  identified.  We  reject  if; 
if 

*  =  1>2, . . . ,  QP. 

This  is  seen  as  quite  a  conservative  test  though. 

To  increase  the  power  of  the  Bonferroni  test  a  sequen¬ 
tially  rejective  Bonferroni  ( Holm ’s)  procedure  [8]  may  be 
used.  Sotting  the  hypotheses,  ff(1),  ff(2),  ■  ■  ■ ,  ff(Qp),  so 
their  corresponding  P-values  satisfy  P(  1)  <^(2)  <•••< 
V(qp),  H(i)  is  then  rejected  if 

p<l)-qp-i  +  i’for*ll|-u'- 

Other  modifications  to  the  Bonferroni  procedure  aim¬ 
ing  to  make  it  less  conservative,  provide  a  test  for  ff0  only, 
not  individual  hypotheses.  Such  tests  may  be  extended,  en¬ 
abling  rejection  of  individual  hypotheses  [9].  This  was  con¬ 
sidered  in  [10]. 


4.  SIMULATION  RESULTS 

The  simulated  system  was  a  quadratic  Volterra  model  with 
memory  M  =  2.  Each  of  the  5  time-varying  coefficients 
used  summed  the  first  Q  =  8  Haar- Walsh  wavelet  pack¬ 
ets  with  random  coefficients  uniformly  distributed  on  [-1,1]. 
The  input  was  Gaussian,  J\f( 0, 1),  and  of  length  N  =  128. 

Four  models  were  identified,  each  using  a  basis  of  8 
wavelet  packets  (as  in  Figure  2).  The  basis  for  the  4th  model 
corresponded  to  a  Walsh  basis.  The  entropies  of  the  ba¬ 
sis  expansions  for  this  model  were  1.2074, 1.2552, 1.0993, 
1.5547  and  1.5102.  Using  Best  Basis  to  find  the  most  effi¬ 
cient  basis  expansion  for  each  time- varying  coefficient  led 
to  the  entropies  1.2055, 1.2177, 1.0132, 0.9884  and  1.2571, 
which  are  lower.  The  wavelet  packets  chosen  were  differ¬ 
ent  from  that  used  to  simulate  the  system,  as  illustrated  in 
Figure  3. 


h(n,  1,0),  and  h(n,  1, 1). 

Figure  3:  Basis  used  to  simulate  the  system  and  best  bases 
chosen  for  model. 

Next,  the  contribution  of  each  wavelet  packet  is  tested. 
It  was  found  that  the  V -values  were  better  indicators  of  the 
contribution  rather  than  the  packets  coefficient  magnitude. 
Starting  with  a  model  containing  no  wavelet  packets  at  all, 
the  packets  were  added  one  at  a  time  according  to  P-values 
and  the  model’s  output  mean  squared  error  (MSE)  calcu¬ 
lated.  This  was  also  done  according  to  coefficient  magni¬ 
tudes.  The  results  are  shown  in  Figure  4.  Including  co¬ 
efficients  in  the  model  according  to  their  T’-value  leads  to 
a  more  accurate  model.  Thus  if  a  model  with  only  30  co¬ 
efficients  is  desired,  the  30  coefficients  with  the  smallest 
P-values  would  be  chosen,  not  those  with  the  largest  mag¬ 
nitude.  This  demonstrates  that  selecting  coefficients  via  hy¬ 
pothesis  testing  is  more  effective  than  by  simple  threshold¬ 
ing. 

Both  Bonferroni  and  Holms'  multiple  test  procedures 
were  applied  to  the  selection  of  sequences  over  a  range  of 
signal  to  noise  ratios  (SNRs).  The  number  of  hypotheses 
rejected  by  each  test  was  averaged  over  1000  runs  (models). 
Figure  5(a)  shows  that  Holms’  proceedure  usually  rejects 
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Figure  4:  Effect  of  increasing  model  size  on  ouput  MSE. 

more  hypotheses  than  the  Bonferroni  test.  Figure  5(b)  com¬ 
pares  applying  Holm's  proceedure  to  the  best  basis  model 
and  the  Walsh  basis  model.  At  higher  SNR,  less  wavelet 
packets  are  significant  than  Walsh  sequences  indicating  a 
more  parsimonious  model.  However,  at  lower  SNR  more 
wavelet  packets  are  selected.  This  may  be  due  to  the  more 
flexible  model  being  able  to  model  the  noise  more  and  thus 
a  number  of  falsely  significant  wavelet  packets  arising. 


Figure  5:  Multiple  hypothesis  test  results. 


5.  CONCLUSION 

Choosing  to  use  wavelet  packets  as  the  sequences  to  ap¬ 
proximate  the  system’s  time-varying  coefficients  makes  the 
model  more  flexible  since  a  library  of  Q(log2(Q)  +  1)  se¬ 
quences  becomes  available,  from  which  Q  orthogonal  se¬ 
quences  may  be  chosen.  Using  another  (non-wavelet)  ba¬ 
sis  restricts  the  range  of  consideration  to  just  Q  specific  se¬ 
quences.  Thus,  the  wavelet  packets  may  lead  to  a  better 
characterisation.  To  select  the  wavelet  packets  that  most  ef¬ 
ficiently  approximate  the  time- varying  coefficients,  the  Best 
Basis  algorithm  may  be  employed. 

To  then  determine  which  wavelet  packets  to  keep  in  the 
model  a  multiple  hypothesis  test  was  applied.  The  V- values 
were  found  to  be  a  better  indicator  of  the  significance  of 


a  packet  than  the  magnitude  of  its  coefficient.  Holms'  test 
was  slightly  more  powerful  than  Bonferroni  at  higher  SNR, 
usually  selecting  1  or  2  extra  wavelet  packets.  Less  wave¬ 
let  packets  were  selected,  at  high  SNR,  when  compared  to 
using  a  model  with  Q  walsh  sequences.  At  low  SNR,  more 
wavelet  packets  were  usually  selected. 
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ABSTRACT 


2.  OPERATION  OF  TCP  VEGAS 


This  paper  addresses  the  problem  of  building  appropriate  statisti¬ 
cal  models  of  the  way  the  Internet  appears  from  the  point  of  view 
of  congestion,  to  a  Transmission  Control  Protocol  (TCP)  sender. 
TCP  is  a  mechanism  for  implementing  full  duplex,  acknowledged, 
end-to-end  transmission  over  an  Internet  Protocol  (IP)  network. 
This  work  has  been  motivated  by  the  most  recent  TCP  variant,  the 
so-called  Vegas  implementation.  TCP  Vegas  is  really  the  first  im¬ 
plementation  to  be  based  loosely  on  system  theoretic  ideas  in  the 
sense  that  it  measures  the  segment  round-trip  times  across  the  net¬ 
work  to  adjust  its  transmission  rate.  This  paper  develops  a  new  lin¬ 
ear  system  framework  for  TCP,  and  applies  Recursive  Prediction 
Error  identification  techniques  to  specify  statistical  models  which 
may  be  used  to  develop  alternative  control  strategies.  Network 
simulations  are  used  to  illustrate  behaviour. 


The  network  will  be  characterised  by  a  set  of  Q  TCP  sources  S  = 
{si  :  i  =  1, . . .  ,  Q],  and  associated  receivers.  Each  source  i 
sets  its  transmission  rate  by  maintaining  a  congestion  window  of 
length  Wi(t),  where  t  denotes  a  discrete  time  index.  The  source 
also  measures  the  round  trip  time  (RTT)  Did)  associated  with  the 
time  of  each  received  ACK.  The  standard  TCP  Vegas  control  law 
for  adjusting  w,  ( t )  is 

f  Did)-1  ii(t)>a 

Wi(t  +  1)  =  wi(t)  +  S  —Did)-1  Zid)  <  0  (1) 

[  0  else  , 

where 


1.  INTRODUCTION 


(2) 


The  rapid  increase  in  Internet  utilisation  has  led  to  high  levels  of 
congestion  in  some  parts  of  the  network,  and  has  provided  a  driv¬ 
ing  force  to  improve  efficiencies  and  throughput.  End-to-end  con¬ 
gestion  control  is  included  as  part  of  the  implementation  of  the 
Transmission  Control  Protocol  (TCP)  which  provides  a  full  duplex 
connection  across  the  network.  Reliability  is  achieved  in  TCP,  by 
use  of  acknowledgements  (ACKs)  returned  from  the  receiver  to 
the  sender.  The  vast  majority  (>  90%)  of  applications  use  TCP 
as  the  transport  mechanism,  so  improvements  in  the  operation  of 
TCP  could  well  lead  to  significant  improvements  in  Internet  per¬ 
formance.  Indeed,  variants  of  the  original  TCP  implementation 
have  been  proposed  in  an  attempt  to  achieve  just  this.  One  sig¬ 
nificant  recent  proposal  is  that  of  TCP  Vegas  [1],  While  earlier 
variants,  such  as  TCP  Reno  (the  most  common  in  use  currently) 
utilised  a  very  coarse  method  of  congestion  control,  Vegas  adjusts 
its  transmission  rate  in  response  to  the  round  hip  times  (RTTs)  of 
segments,  ie  the  time  between  a  segment  being  sent  and  the  recep¬ 
tion  of  the  corresponding  ACK.  Thus  there  is  at  least  in  principle, 
the  presence  of  a  feedback  control  mechanism.  Vegas  however 
does  not  attempt  to  use  any  model  of  the  relationship  between 
transmission  rate  and  RTTs,  except  to  recognise  that  large  RTTs 
are  indicative  of  congestion  and  thus  force  reductions  in  transmis¬ 
sion  rate,  albeit  in  a  rather  crude  and  inefficient  manner.  Vegas 
relies  on  a  very  simple  control  law  precisely  because  it  does  not 
utilise  any  modelling  information.  The  purpose  of  this  work  is 
to  (i)  derive  a  linear  modelling  representation  consistent  with  the 
operation  of  Vegas,  and  (ii)  investigate  closed  loop  system  identifi¬ 
cation  schemes  which  are  suitable  for  operation  in  this  framework, 
including  appropriate  disturbance  modelling. 


Here  di  denotes  the  round  trip  propagation  delay  for  source  i,  and 
a  and  0  are  design  parameters.  We  address  their  selection  subse¬ 
quently.  We  initially  assume  that  a  =  0  for  ease  of  analysis.  Thus 
in  this  case,  (1)  may  be  written 


w,d  + 1)  =  Wid)  + 


sgn(e,(f)) 

d^)  ’ 


where  the  error  term  is  given  by 


(3) 


dd)  —  adi  —  void)  (  1 


0  Did))' 


(4) 


In  the  following,  we  use  the  transformed  measurements 


(5) 

which  may  be  interpreted  as  the  bandwidth  efficiency  for  source  i. 
So  (3)  and  (4)  may  be  written 


wid  + 1) 

ad) 


vn  d)  + 


sgn  did)) 

di(l  -  Vi(t)) 


ri  -  Wid)  Did)- 


(6) 


This  error  term  can  be  easily  interpreted  as  the  difference  between 
the  term  r,  =  a  di  and  the  number  of  packets  queued  in  the  sys¬ 
tem  for  source  i  at  time  t.  Thus  n  can  be  regarded  as  the  desired 
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number  of  packets  from  source  i  to  be  queued  in  the  system  at  any 
given  time.  These  parameters  can  be  regarded  as  resource  alloca¬ 
tion  parameters  [2].  The  equations  (3)  and  (4)  thus  define  a  control 
system  model  for  TCP  Vegas.  The  measurements  0,(1),  and  thus 
yt(t)  are  related  to  all  users  window  sizes  ie  un  (.s), ....  wq(s) 
for  s  <  f,  and  other  factors  such  as  the  network  topology  and 
queueing  disciplines  etc. 

3.  A  LINEAR  SYSTEM  MODEL 

In  this  section,  we  define  a  linear  system  model  consistent  with 
the  operation  of  Vegas.  The  key  idea  we  shall  use  is  to  work  with 
logarithmic  values  i ;,(<)  of  the  congestion  window  and  to 

use  a  linear  update  rule  for  these  quantities.  This  corresponds  to 
a  proportional  update  rule  for  the  window  length  itself.  The  error 
term  r,  (t )  is  chosen  to  be  of  the  form 

"(,)  =  los(rnkm) 

=  Si  -  Zi{t)  (7) 

where  s,  —  logr,  and  2,(7.)  =  log ?/,(().  Clearly  c,  has  the  same 
qualitative  type  of  behaviour  as  e,  in  that  it  is  positive  if  the  num¬ 
ber  of  queued  packets  for  user  i  is  less  than  the  desired  number, 
and  negative  otherwise.  We  argue  however  that  such  an  error  func¬ 
tion  is  more  appropriate  for  positive  quantities  than  a  difference  of 
terms. 

The  key  challenge  addressed  here  is  the  derivation  of  a  decen¬ 
tralised  model  for  the  observation  process 

z’(t)  =  log  (i  -  ’  (8) 

in  terms  of  the  inputs  v,(t),  where  we  recall  that  D,(t)  is  the  RTT 
seen  by  user  i.  Depending  on  the  state  of  the  network,  which  is 
determined  by  the  other  users  as  well  as  user  i ,  this  quantity  (which 
is  always  negative)  will  be  some  function  of  i>,(f).  We  postulate  a 
linear  model  of  the  form 

z,(t)  =  Gi(z,e )  Vi(t)  +  Hi(z,6)  iii (f )  ,  (9) 

for  a  particular  value  of  parameter  vector  where  G,  (z,  6,)  and 
Hi  (z,  6j)  are  respectively  strictly  proper  and  proper  rational  trans¬ 
fer  functions  with  Hi(z,Qi)  and  H~1(z,6i)  having  all  poles  in¬ 
side  the  unit  circle.  Here  u,  (<)  denotes  a  white  noise  process.  The 
role  of  Hi  here  is  to  model  the  effect  in  closed  loop  of  other  users 
on  congestion  seen  by  user  i,  while  G,  models  the  effect  of  user  i 
transmission  rate  on  the  congestion  as  seen  by  user  i.  Since  z,(t) 
can  take  the  value  of  —  oo  when  there  are  no  packets  queued  in  the 
network,  we  need  to  include  a  hard  limit  on  z,  (f).  This  is  consis¬ 
tent  with  the  operation  of  TCP,  where  the  receiver  sets  an  upper 
bound  on  the  congestion  window  size  and  thus  v,  (<)  in  our  model. 

Looking  forward  to  the  potential  application  of  our  identified  model 
to  the  derivation  of  alternative  congestion  control  strategies,  we 
follow  the  methodology  of  [3],  and  choose  the  Recursive  Predic¬ 
tion  Error  (RPE)  identification  procedure  for  its  robustness  prop¬ 
erties.  The  prediction  error  is  given  by 


Pi(t.8,)  =  Zi(t)-z,(t\t-l,Oi) 

=  —Hj(oo)  Hf'lz^i)  Gi(z,8j)  Vj(t) 

+H,( oo)  H-\z,e,)zi{t)  . 

We  recursively  minimise  the  variance  of  the  prediction  error  using 
the  familiar  Recursive  Least  Squares  (RLS)  approach,  yielding  es¬ 
timates  8j(t)  for  0i.  Figure  1  depicts  the  structure  of  the  closed- 
loop  system  with  the  identification  process  included. 


Fig.  1.  TCP  closed  loop  identification  system  model 


4.  NETWORK  SIMULATIONS 

We  conducted  a  network  simulation  with  2  TCP  senders  and  2  re¬ 
ceivers.  transmitting  over  a  link  of  length  1000  km  and  capacity 
2.5  Mbyte/sec.  Thus  the  round  trip  propagation  delay  =  667  //sec. 
A  third  (Poisson)  source  simulates  background  traffic  on  the  link. 
For  the  purposes  of  this  simulation,  slow  start  and  duplicate  ac¬ 
knowledgements  arising  from  timeouts  were  switched  off,  so  that 
we  can  concentrate  on  the  essential  features  of  the  congestion  man¬ 
agement  function.  A  Vegas  type  control  process  was  used  for  each 
TCP  sender.  The  sample  rate  was  250  samples/sec  and  the  simu¬ 
lation  time  was  50  s.  The  mean  background  Poissonian  traffic  was 
set  at  1 .75  Mbyte/sec  with  a  jump  to  2.25  MByte/sec  at  t  =  25s. 
TCP  sender  1  has  a  set  point  ri  =  0.1,  while  for  TCP  sender  2, 
the  set  point  was  r-2  =  1 . 

Figures  3  -  5  show  respectively,  the  instantaneous  background  traf¬ 
fic  rate,  and  the  sending  rates  for  each  TCP  sender. The  mean  total 
combined  rate  in  this  example  is  2.49  MByte/sec,  a  99.5%  utili¬ 
sation.  The  data  has  been  smoothed  with  a  single  pole  filter  with 
pole  at  0.97,  in  order  to  aid  readability.  Our  class  of  models  has 
the  form 
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2.5  Mbyte/sec 


Fig.  2.  TCP  Network  Simulation 


Gi(z)  = 


-£M=1  fci.j  z-i 
1  + 


z~J 


Hi(z) 


,  l  +  Z-Udijz-i 
1 1 *  ’ 

(10) 


where  M  <  N.  We  chose  (after  some  experimentation)  N  =  10, 
M  =  9  and  P  =  6.  Figure  6  shows  an  estimated  power  spec¬ 
trum  of  the  signal  «i(t).  The  dynamic  range  is  approximately  22 
dB.  Figure  7  shows  the  estimated  power  spectrum  of  the  predic¬ 
tion  error  //i  (t).  The  dynamic  range  has  been  reduced  to  about  7.5 
dB  indicating  that  the  model  has  captured  significant  information 
about  the  system.  It  has  both  suppressed  the  low  frequency  vari¬ 
ability  by  about  6  dB  and  significantly  whitened  the  spectrum  at 
mid  to  high  frequencies.  Figures  8  and  9  show  the  filter  transfer 
function  magnitudes  |Gi(a)|  and  \Hi(z)\  for  user  1  at  the  end  of 
the  first  25  sec,  and  then  at  the  end  of  50  sec.  Thus  we  can  observe 
the  effect  of  the  change  in  background  traffic  intensity  at  t  =  25 
sec.  It  should  be  pointed  out  that  the  RPE  technique  does  not  ex¬ 
plicitly  yield  the  power  of  the  exogenous  disturbance  white  noise 
process  u;(t),  which  is  encompassed  in  the  parameter  5i  in  (10). 
Thus  we  can  only  estimate  Hi  (oo)  1  Hi(z).  Thus  figure  9  shows 
little  variation  in  the  colouring  of  the  background  traffic.  However 
figure  8  illustrates  both  the  drop  in  transmission  rate  for  sender  1 
together  with  its  lower  frequency  content  as  evident  from  figure  4. 


Fig.  4.  Transmission  Rate  -  TCP  Sender  1 


Fig.  5.  Transmission  Rate  -  TCP  Sender  2 


sender  in  closed  loop.  The  model  incorporates  both  the  effect 
of  that  user's  transmission  rate,  and  the  aggregated  effect  on  that 
user's  observed  congestion  from  all  other  users.  Some  preliminary 
simulations  for  a  bottleneck  queue  suggest  that  such  a  modelling 
strategy  can  encompass  some  of  the  relevant  information  about  ob¬ 
served  congestion.  These  studies  are  very  preliminary  :  there  are 
many  issues  to  investigate  such  as  choice  of  model  order  and  track¬ 
ing  capability.  More  extensive  results  are  presented  in  [4],  Our 
main  objective  in  formulating  the  TCP  congestion  control  prob¬ 
lem  as  we  have  here,  is  to  replace  the  existing  TCP  control  which 
employs  no  underlying  model  of  the  observed  congestion  (through 
RTTs),  with  a  model  based  control  approach.  This  approach  is  de¬ 
picted  in  figure  10.  The  design  of  adaptive  robust  controllers  as  in 
[3]  will  build  on  the  system  identification  studies  presented  here. 


5.  CONCLUSION 
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Fig.  6.  Estimated  PSD  for  user  1  data  Fig.  8.  Transfer  Functions  G  i 


Fig.  7.  Estimated  PSD  for  user  1  prediction  error  Fig.  9  Transfer  Functions  Ht 
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ABSTRACT 

This  paper  presents  a  closed-form  solution  for  obtaining  the  opti¬ 
mal  coefficients  of  variable  FIR  filters  with  continuously  adjustable 
fractional-delay  (FD)  response.  The  design  is  formulated  as  a 
weighted-least-squares  (WLS)  approximation  problem  without  dis¬ 
cretizing  the  frequency  and  fractional-delay  parameters.  Com¬ 
pared  with  the  existing  WLS  method,  the  discretization-free  one 
can  yield  a  closed-form  optimal  solution  with  considerably  re¬ 
duced  computational  complexity. 

1.  INTRODUCTION 


2.  CLOSED-FORM  DESIGN 

In  this  paper,  we  consider  the  polynomial-based  variable  FD  filter 

No 

H(z,p)=  ^  an(p)z~U  (1) 

n=-Nx 

where  Ni  and  No  are  positive  integers  determining  the  filter  order 
A  =  Nt  -(-  No 
whose  values  are  selected  as 


In  many  digital  signal  processing  (DSP)  applications  and  telecom¬ 
munications,  the  frequency  characteristics  of  digital  filters  are  re¬ 
quired  to  be  continuously  variable  (adjustable).  Such  digital  filters 
are  referred  to  as  variable  digital  filters  [1,  2,  3,  4,  5,  6,  7].  Re- 
cendy,  the  variable  filters  with  adjustable  fractional-delay  (FD)  re¬ 
sponse  have  been  found  useful  in  various  applications  [8],  The  sur¬ 
vey  paper  [8]  provides  a  comprehensive  review  and  comparison  of 
most  of  the  existing  methods  for  designing  and  implementing  such 
variable  FD  filters.  Among  those  methods,  the  Lagrange  interpola¬ 
tion  method  is  considered  to  be  the  most  attractive  due  to  its  sim¬ 
plicity  [8],  but  the  frequency  response  of  the  resulting  Lagrange 
interpolator  cannot  be  uniformly  balanced  in  the  entire  frequency 
band,  i.e.,  the  frequency  response  in  the  low  frequency  region  is 
better  than  high  frequency  response  as  demonstrated  in  [9],  There¬ 
fore,  it  is  difficult  to  achieve  a  satisfactory  design  with  low  filter 
orders  in  the  whole  frequency  band  by  using  the  Lagrange  interpo¬ 
lation  method.  To  solve  this  problem,  paper  [9]  proposed  a  general 
technique  using  the  weighted-least-squares  (WLS)  method,  which 
can  yield  a  more  satisfactory  design  with  lower  filter  order  than  the 
Lagrange  interpolator.  However,  the  WLS  method  requires  sam¬ 
pling  the  frequency  to  and  fractional-delay  p.  That  is,  the  parame¬ 
ter  discretizations  are  necessary.  Usually,  the  sampling  grids  of  the 
parameter  discretizations  must  be  dense  enough  to  guarantee  high 
design  accuracy,  which  in  turn  increases  the  computational  com¬ 
plexity  needed  in  filter  design  process.  Therefore,  it  is  desirable  to 
derive  a  closed-form  optimal  solution  without  discretizations. 

This  paper  presents  a  new  method  for  obtaining  the  closed- 
form  optimal  solution  of  variable  FD  filter  coefficients  by  formu¬ 
lating  the  FD  filter  design  as  an  integral  WLS  approximation  prob¬ 
lem  without  parameter  discretizations  [10],  Compared  with  the 
existing  WLS  method  in  [9],  the  discretization-free  method  can 
achieve  higher  design  accuracy  with  considerably  reduced  compu¬ 
tational  complexity.  A  design  example  is  given  to  demonstrate  its 
effectiveness. 


N  i 


N-  1 
2 


N2 


N  +  l 


for  odd  A',  and 

N1=N2  =  j 

for  even  N,  respectively.  The  filter  coefficients  an(p)  are  ex¬ 
pressed  as  the  polynomials  of  the  fractional-delay  p  as 


an(p)  =  T  a(n,  k)pk  (2) 

k= 0 

where  p  6  [—0.5, 0.5],  Thus  the  transfer  function  H(z,p)  can  be 
rewritten  as 


n2  k 

H(z,p)  =  E  £  a(n,k)pkz  n  =  aT(p®z )  (3) 

n=  —  N i  k  =  0 


where  the  notation  ©  denotes  the  Kronecker  product 

P  =  [  1  P  P2  ■■■  PK  ] T 

Z=[sWl  •••  S  1  z-1  z~N*  ]T 

and  vector  a  is  the  column  string  of  the  matrix  A  =  [a(n,  Is)]. 
The  actual  variable  frequency  response  is 

H(u!,p)  =  oT(p©  w) 

where  the  complex  vector  w  is 

co  =  [  eJWl“  ...  ejw  1  e~3ul  ■■■  e~jN2UI  ]T 

Also,  the  desired  variable  frequency  response  is  given  by 

Hd(u,p)  =  e~’“p 


(4) 


(5) 


(6) 
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where  the  normalized  frequency  u?  is  in  the  range  w  €  [0.  -]. 
and  the  fractional-delay  p  is  continuously  variable  in  the  range 
p  €  [—0.5, 0.5].  Our  objective  here  is  to  find  the  optimal  coeffi¬ 
cient  vector  a  such  that  the  weighted  squared  error  of  the  variable 
frequency  response 

/*  7T  /*0.5 

J{a)  =  /  /  W{u),p)\H[uf,p)  —  Hj(*\p)\2(U'dp  (7) 

Jo  J- 0.5 

is  minimized.  For  deriving  a  closed-form  solution,  we  assume  that 
the  2-D  weighting  function  W{w,  p)  is  separable  as 


Il'(u.’,p)  =  ll  i  ( w )  1 1  2 ( 7J ) 

which  is  the  product  of  1-D  stepwise  functions 

Vl’i(u>)  =  ai  for  w  G  [a>(_] ,  W()  .  /  =  1.2.-- 

W2(p)  =  0m  for  P  e  [pm-l,Pm  )  .  "1  =  1,2,- 


,L 
■ .  M 


(8) 


(9) 


where  a  /  and  0m  are  constants.  Since  frequency  response  error  is 


e(u>,p)  =  H(*',p)  -  H,i(u,p )  =  ar(pC::u>)  -e" 


thus 


k(w,p)r 

=  \a.T(p  O  u>)  —  e 


.  - 


w][(pOt 

u>*  )a  —  2Re  [aT  (p  Q  to) 

=  ei(w,p)  -  2 e2(ui,p)  +  1 


:Or(pQw)(pr  O 


+  1 


(10) 

where  [  •  ]*  means  the  Hermitian  adjoint,  and  Re[  •  ]  denotes  the 
real  part  of  [  •  ].  By  using  the  property  of  Kronecker  product  that 


(A(i)B)(C  OD)  =  (AC)O(BD) 
we  obtain  the  first  term  in  ( 10)  as 

ei(w,p)  =  or(pOw)(p'  O«’)o 


=  oT  [(ppT)  O  (tow*)]  a 


where 


_  T 

pp  = 


1 

P 

PK 


P 


Po 

P3 

A+2 


P1' 

nA'+l 


2  A 


(ID 


(12) 


(13) 


is  a  Hankel  matrix,  and  toto*  is  a  complex  Hermitian  Toeplitz  ma¬ 
trix.  Furthermore,  the  second  term  in  ( 10)  is 

e2(to,p)  =  Re  [aT{p  ©  to)fJ“T]  =aT(p(iq )  (14) 

where  the  real  vector  q  is  defined  as 

q  =  [cosu/(p  +  N\ )  •••  cos  pa.'  •••  cos  iV2)]r-  (15) 

Consequently,  substituting  (8)  and  ( 10)  into  (7)  obtains 

J(a.)  =  Ji(a)  -  2.72(a)  +  Jz  (16) 

where 


n0. 5 

J -0.5 


ei  (u>,  p)du)dp 


(p)pp  clp 


& 


j  H'i  (u;)toto*rZw  j>  a 


a1  (P  0  12c)a. 


pp  dp.  (18) 


The  above  matrix  P  can  be  computed  by 

/os  M  rv,u 

'<2 (P)PPT dp  =  ^  Jm  /  PI 

-OS  m=l 

It  should  be  noted  that  since  P,„  is  a  Hankel  matrix,  thus  only 
its  first  column  and  last  row  are  needed  for  generating  the  Pm . 
Moreover,  it  is  clear  that  the  resulting  matrix  P  in  (18)  is  also  a 
Hankel  matrix  whose  size  is  (K+l)-by-(K+l).  On  the  other  hand, 
although  the  12,  in  (17)  is  a  complex  Hermitian  matrix,  we  only 
need  to  determine  its  real  part  for  evaluating  J\(a)  since  Ji(a)  is 
real-valued.  The  real  part  of  12,  is  calculated  by 


12  =  Re  [12,]  =  y^o  /12; 


(19) 


whose  elements  are 


{UJI  —  U>l-1 

sin  [(/  -  j)uii]  -  sin  [(/  - 
i  ~  J 


if  i  =  j 
for  t  ±  j 

(20) 

and  i,  j  =  1,  2,  •  •  • .  ( N  +  1 ). 

It  should  also  be  pointed  out  that  since  12/  is  a  symmetric 
Toeplitz  matrix,  thus  only  its  first  row  needs  to  be  computed  for 
generating  12/.  As  a  result,  the  resulting  matrix  12  is  also  a  sym¬ 
metric  Toeplitz  matrix  whose  size  is  (N+l)-by-(N+l  ). 

Based  on  the  above  derivations  of  matrices  P  and  12,  the 
Ji  ( a )  in  ( 1 6)  can  be  obtained  as 

Ji(a)  =  ar(PGl2)a.  (21) 

In  addition,  the  second  term  J2(a)  in  (16)  is  evaluated  by 


h(a) 


nll  j(u>)ir2(p)  [®T(P  O  <?)]  dwdp 

-0.5 


a1  |  f  W-i(p)  p  O  I  Wi(Lo)qdu) 
l  J  —0.5  L  J 0 

/*  0.5 

/ 

17-0.5 


dp 


\V-2(p)(p  G  u)dp 


T 

a  v 


where 

u  =  I  H  j  (  oj  )qdu:  =  o 
i=i 

and  the  elements  of  vector  ui  are  determined  by 
«/(»/) 


i  ui 


(22) 


(23) 


/  q(n)du 
J  “l-  1 


a,’/  —  0-2/  —  i  if  7  =  0 

sin(-;U.’/ )  -  sin(-)u//-i )  ,f  ^  Q 
1 

with  =  p  +  Ni  —  v  +  1,  and  n  =  1,  2,  •  •  • ,  (N  +  1). 
Furthermore,  the  vector  v  in  (22)  can  be  calculated  as 


(24) 


(17) 


,  0.5  A  II 

v=  \YAP)(P  G  u)dp  =  ^  ^  «//f 

J- 0.5  m=. 


» 


(25) 
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where 


■Pm 

Vim  =  I  (pOUl)dp 

whose  elements  can  be  computed  by  using  numerical  integration 
as 


/*Pm 

Vlm(i)  —  /  Pk  Ui  | 

J  Pm-l 

In)  dp 

(26) 

k  =  0,  1,  •  ■  ■ ,  I\,  rt  =  1, 2, 
i  =  k(N  +  1)  +  n. 

---,(/V+l) 

(27) 

Lastly,  the  third  term  in  (16)  is 


(1) .  Once  the  optimal  coefficient  vector  a  in  (34)  is  obtained, 
we  can  yield  the  coefficients  a(n,  k)  in  (3).  Substituting  different 
values  ofp  into  (3)  results  in  a  variable  filter  H(z,p )  with  differ¬ 
ent  fractional-delays,  which  can  be  implemented  by  using  Farrow 
structure  [11], 

(2) .  Since  the  range  of  variable  fractional-delay  p  is  p  £ 
[—0.5, 0.5],  all  non-integer  delays  with  integer  and  fractional  parts 
can  be  covered  by  just  cascading  H(z,p)  with  D  delay  elements 
(z~D),  where  D  is  a  positive  integer.  Evidently,  if  D  >  Ari, 
z~D  H(z,p)  becomes  causal. 

3.  DESIGN  EXAMPLE 


Wi(u!)W2(p)duidp  =  constant.  (28) 


JO  J- 0.5 

Thus,  the  final  error  function  (16)  can  be  formed  by  combining  the 
derived  J\(a ),  ./2(a),  and  J3  as 

J(a )  =  aT  (P  0  fi)a  —  2 aT v  +  constant.  (29) 


The  optimal  solution  for  a  can  be  found  by  setting  the  derivative 
of  J(a)  with  respect  to  a  to  zero,  i.e., 

~^  =  [(P©O)  +  (P0fl)T]a-2v=O.  (30) 

Since  P  is  a  Hankel  matrix,  and  Si  is  a  symmetric  Toeplitz  matrix, 
thus 

PT  =  P,  UT  =  U 

which  leads  to 


(p®uf  =  pT  ®nT  =  p  ©  si. 


As  a  result,  we  obtain 

a  =  {P®  Sl)~lv  =  (F-1  (31) 

Since  the  direct  inversion  of  matrices  P  and  SI  may  suffer  from  ill- 
conditioning  problem  due  to  their  large  condition  numbers,  some 
efficient  measure  must  be  taken  to  avoid  the  numerical  problem.  In 
considering  that  both  P  and  SI  are  positive  definite  matrices,  and 
they  can  be  decomposed  by  using  the  Cholesky  factorization  as 

P  =  RtR,  n  =  STS  (32) 


where  R  and  S  are  upper  triangular  matrices,  thus  we  can  indi¬ 
rectly  compute  the  inverses  of  P  and  SI  as 

P-1  =  R-'R-r,  ft”1  =  S-1S~T  (33) 


In  this  section,  we  illustrate  the  proposed  discretization-free  method 
by  designing  a  variable  FD  filter  with  the  same  design  specification 
as  that  in  [9]:  the  maximum  absolute  error  of  variable  frequency 
response  defined  by 

e-nxax  =  max{e(a!,p)|oj  £  [0,0.9*],p  £  [—0.5, 0.5]}  (35) 

where 

e(w,p)  =  201og10  \H(u,p)  -  Ha(u,p)\  (36) 

should  not  exceed  the  level  of  —  lOOdB  for  any  fractional-delay  p 
and  any  frequency  w  in  the  above  range  of  our  interest,  i.e.,  0  < 
u)  <  0.9a-,  and  —0.5  <  p  <  0.5. 

We  have  tackled  this  design  problem  by  using  the  method  [9] 
and  the  proposed  discretization-free  method,  and  found  that  if  the 
design  parameters 


N  =  65,  J 

W  =  7 

(  0.64 

for  w 

£  [0,0.55*) 

Wi(uj)  =  < 

)  4.9 

1  37 

for  ui 
for  u> 

£  [0.55*,  0.85*) 

£  [0.85*,  0.8996*) 

{  0 

for  w 

£  [0.8996*,  *] 

1 

'  53 

for  p  £ 

[-0.5, -0.4) 

w2(p)  =  { 

0.2 

forp  £ 

1 

O 

O 

1 

8 

forp  £ 

O 

ji. 

O 

Or 

are  selected,  the  above  design  specification  can  be  satisfied  by  the 
proposed  method.  To  compare  the  discretization-free  method  with 
the  method  in  [9],  the  maximum  absolute  deviation  of  the  fre¬ 
quency  response  defined  in  (35)  and  the  root-mean-squared  error 
defined  by 


/'O  .9 7T  /*0.5 

E2=  /  / 

Lao  J—  o.s 


I1/2 


\H(u>,p)  —  Hd(w,p)\2duidp 


(38) 


Substituting  (33)  into  (31)  and  applying  the  property  (11)  yields 
the  closed-form  optimal  solution 

a  =  [(R-1R-t)®(S-1S-t)]v 

=  (R-1  ®S~1)(R-T  ®S~T)v 
=  (R-1  ®  S"1 )  [( R~T  ®  S~T)v]  .  (34) 

It  should  be  noticed  that  the  re-grouping  in  the  last  expression  of 
(34)  is  very  important  for  ensuring  a  numerically  stabilized  opti¬ 
mal  solution.  Finally,  some  additional  remarks  should  be  given  to 
clarify  the  following  points. 


are  used  to  evaluate  the  design  accuracy,  and  the  number  of  floating- 
point-operations  (Flops)  required  for  determining  the  optimal  co¬ 
efficient  vector  a  is  used  to  compare  the  computational  complex¬ 
ity.  Table  1  shows  the  design  results  by  the  two  methods.  It  is  ob¬ 
served  from  Table  1  that  the  proposed  discretization-free  method 
can  meet  the  design  requirement  since  the  maximum  deviation  of 
the  frequency  response  is  below  -  lOOdB,  but  the  maximum  devi¬ 
ation  by  the  method  [9]  is  — 99.9208dB.  Moreover,  the  proposed 
method  requires  only  5.58%  of  the  Flops  required  by  method  in 
[9],  Therefore,  a  significant  reduction  of  computational  complex¬ 
ity  has  been  achieved.  Fig.  1  depicts  the  variable  fractional-delay 
response  in  the  range  u  €  [0,0.9a-]  and  p  £  [—0.5, 0.5],  It 
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is  observed  that  the  variable  fractional-delay  response  of  the  de¬ 
signed  variable  FD  filter  is  considerably  flat  in  the  entire  range. 
Also,  the  absolute  error  of  variable  frequency  response  is  illus¬ 
trated  in  Fig.  2,  which  shows  significantly  small  error  in  our  inter¬ 
ested  range. 

4.  CONCLUSION 

A  weighted-least-squares  (WLS)  method  without  parameter  dis¬ 
cretizations  has  been  proposed  for  deriving  the  optimal  closed- 
form  solution  of  variable  FD  filter  coefficients.  Since  the  proposed 
method  does  not  require  sampling  the  frequency  u>  and  fractional- 
delay  p  in  obtaining  the  final  closed-form  solution,  thus  a  consid¬ 
erable  reduction  of  computational  complexity  and  higher  design 
accuracy  can  be  achieved.  An  illustrative  example  has  been  pre¬ 
sented  to  demonstrate  the  effectiveness  of  the  proposed  method. 
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Table  1 :  Design  Errors  and  Computational  Complexity 


Method  in  [9] 

Proposed  Method 

t  T71  O  ,r  (  dB  ) 

-99.9208 

-100.3683 

t:2 

4.4931  xlO^ 

4.4147  xlO-'1 

Flops  Used 

84057162 

4694097 

Fig.  1.  VariableFractional-DelayRespon.se. 


Normiti/<4  1  irqixixA 


Fig.  2.  Absolute  Error  of  Variable  Frequency  Response. 
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ABSTRACT 

This  paper  considers  the  design  of  a  digital  filter  with  pre¬ 
scribed  magnitude  and  group  delay  specifications.  First,  we 
outline  the  derivation  of  the  phase  and  group  delay  func¬ 
tions  of  an  Nth  order  digital  Laguerre  filter  and  show  that 
the  group  delay  function  of  the  filter  can  be  written  as  a 
ratio  of  quadratic  functions  in  the  filter  coefficients.  Then, 
we  formulate  our  filter  design  problem  as  a  constrained  L 2 
space  minimization  problem  in  which  the  performance  re¬ 
quirement  on  the  group  delay  and  magnitude  in  the  pass- 
band  axe  treated  as  constraints  while  minimizing  the  L> 
norm  of  the  error  function  between  the  designed  and  the 
desired  filters.  Methods  for  solving  the  proposed  nonlin¬ 
ear  optimization  problem  are  outlined.  Numerical  results 
are  presented  to  illustrate  the  usefulness  of  the  proposed 
method.  As  a  special  case,  corresponding  results  for  gen¬ 
eral  FIR  filters  are  also  derived. 


1.  Introduction 

In  filter  design  literature,  the  problem  of  designing  linear 
phase  FIR  filters  with  desired  magnitude  characteristics  has 
been  well  studied  [7].  Algorithms  and  efficient  softwares  are 
now  readily  available  for  the  design  of  such  filters.  Linear- 
phase  restriction  imposed  on  the  whole  frequency  range  con¬ 
verts  the  filter  design  problem  into  a  real  approximation 
problem  for  which  efficient  classical  methods  for  real  ap¬ 
proximation  such  as  the  Remez  exchange  algorithms  can  be 
utilised  to  find  the  optimal  solution.  However,  linear  phase 
filter  with  short  transition  bands  introduce  large  delays  -  an 
undesirable  feature  in  many  applications  such  as  adaptive 
filtering  in  acoustic  echo  cancellation.  Moreover,  the  linear- 
phase  restriction  is  not  needed  in  the  stopband.  In  practice, 
filters  with  a  well-specified  phase  or  group  delay  functions 
in  the  passband  together  with  desired  magnitude  charac¬ 
teristics  are  highly  desirable  in  a  variety  of  areas  such  as 
communication  channel  equalisation,  chirp  processing,  and 
optimal  beamforming  with  unequally  spaced  sensor  arrays. 

Imposing  phase  or  group  delay  requirement  only  in  the 
passband  results  in  a  complex  approximation  problem.  In 
recent  years,  such  problems  have  attracted  the  attention  of 
many  researchers,  see  [2],  [5],  [6]  and  the  references  therein. 
In  this  paper,  using  a  filter  structure  based  on  the  set  of 
orthonormal  Laguerre  functions,  we  shall  investigate  the 


design  of  digital  HR  filters  which  satisfy  prescribed  mag¬ 
nitude  and  group  delay  requirements.  First,  we  show  that 
the  group  delay  function  of  an  Nth  order  digital  Laguerre 
filter  can  be  written  as  a  ratio  of  quadratic  functions  with 
respect  to  the  filter  coefficient  vector.  Then,  we  utilise  the 
expression  of  the  group  delay  function  to  pose  our  filter  de¬ 
sign  problem  as  a  constrained  L2  space  optimization  prob¬ 
lem.  Performance  requirements  on  the  group  delay  function 
and  magnitude  spectrum  are  treated  as  constraints  while 
minimising  the  L2  norm  of  the  error  function  between  the 
designed  and  the  desired  filters.  Methods  for  solving  the 
nonlinear  optimization  problem  are  briefly  outlined.  Nu¬ 
merical  results  are  presented  to  illustrate  the  usefulness  of 
the  proposed  method.  The  results  obtained  are  also  appli¬ 
cable  to  the  design  of  FIR  filters  which  is  a  special  case  of 
the  digital  Laguerre  filters. 

2.  Laguerre  Filter  and  Its  Phase  and  Group 
Delay  Functions 

Consider  the  following  Nth  order  Laguerre  filter 


n-  1 

Hl(z)  =  2_J  xkM*) 

k=0 


(1) 


where  <^jt(z)’ s  are  the  Laguerre  functions  defined  as 


4>k{z)  = 


x/ 1  —  a2 
1  —  az-1 


—  a 
az-1 


k 


k  =  0,1,2,... 


Define 


A0{z)  = 


\/l  —  a2 
1  —  az-1 1 


Aap{z ) 


(2) 


where  -4o(z)  =  <po(z)  is  a  lowpass  filter  for  0  <  a  <  1  and 
a  highpass  filter  for  -1  <  a  <  0,  Aap(z)  is  an  allpass  filter. 
Note  that  Hl  (z)  is  an  N-tap  FIR  filter  for  a  =  0. 

2.1.  The  Phase  Function  of  Hl{z) 

Let  us  outline  the  derivation  of  the  phase  function  of  Hl  (z). 
Detailed  derivation  can  be  found  in  [10]. 

First,  it  is  easy  to  verify  that  Ao (eJ“')  can  be  written  as 


‘This  project  was  partially  supported  by  a  research  grant  from 
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A0(ej“)  = 


v/1  —  2a  cos  w  +  a' 


_e -3  arctan[Tf^u 
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Therefore,  the  phase  function  of  .4o(.s),  denoted  as  0o(^')- 
can  be  expressed  as 


0o(uj)  =  -arg(.4o(eJu;))  =  arctan 


L 1  —  a  cos  u> 


(3) 


Similarly,  the  phase  function  of  the  allpass  filter  .4fl;,(c-'“) 
can  be  written  as 


9„p  —  u}  +  2  arctan 


a  sin  uj 
l—o  cos  a;. 


(4) 


Combining  (4)  with  (3),  we  can  conclude  that  the  phase 
function,  denoted  as  0*.(u),  of  <f>k(z)  is  given  by 


8k(uj)  =  kuj  +  (2k  +  1)  arctan 


a  sin  to 
.l  —  o  cos  to 


(5) 


As  a  result,  Tfi(eJ0J)  can  be  written  as 

JV-l 

HL(e,ul)  =  y,  x.k4>k(e?w) 

A=0 

N-l 

=  7(w)  y  xke~j(>ki"') 

k=  0 


where  7 (to)  denotes  the  magnitude  spectrum  of  Ao(;) 

7(07)  =  - 

\/l  -  2 0  cos  to  +  o2 

Hence,  the  phase  function  of  Hl(z)  can  be  expressed  as 


6hl  (to)  =  arctan 


ELo1  ^ sin 

EaEo  cos  (to) 


(G) 


2.2.  The  Group  Delay  Function  of  Hl(z) 

Using  (6),  the  group  delay  function  of  Hl(z )  is  given  by, 


rgH 


=  —  arctan 
ato 


ELo  **-sin6h(co) 

EaEo’  xkCOS0k(u) 


It  can  be  proved  [10],  through  some  algebra,  that  Trj  can  be 
expressed  as 


rg(co)  = 


xT  Pr(u)x 
XT  P(b})x 


(7) 


where  x  =  [tro,  ,  •  •  ■ ,  £Ar-i]T  and 


PH 

=  C(u>)CT(ui)  +  S(ui)ST  (ui) 

Pr(u>) 

=  (C(lj)CT(u)  + S(lj)ST(u))Y(lj) 

F(w) 

=  diag{r0(a>),  r  1  (tu), ...,  Fjv_i(u;)} 

r*(w) 

k  +  (2k  +  l)  a(c0Sa;“a) 

+  (  +1)l-2acostu  +  o2 

CH  = 

[cos0o(w),  COS 01  (w),  ...,  C.OS0A  — 1  (tu)]T 

SH  = 

[sin0o(w),  sin0i(u;),  ...,  sin  0,v-i  (tu)]T 

9k( to)  is  given  by  (5).  From  (7)  we  know  that  the  group 
delay  function  of  an  Nth  order  Lagucrre  filter  is  a  ratio  of 
two  quadratic  functions  with  respect  to  the  filter  coefficient 
vector  x. 

Note  that,  in  general.  P\-  is  not  a  symmetric  matrix. 
However,  since  for  any  real  x,  xT Pr(u)x  is  always  real  for 
any  to,  it  can  be  concluded  [?]  that 

xT  Pr(uj)x  =  ^T(Pr(to)  +  PrT  (to).r 


Obviously,  Qr  =  |(Pr( to)  +  P7  (to))  is  a  symmetric  matrix. 
As  a  result,  we  have 


r3(to)  = 


xrQr(to).T 

xrP(uj)r 


(8) 


2.3.  Phase  and  Group  Delay  Functions  of  an  FIR 
Filter 

For  a  given  FIR  fiter  HF(z)  =  EaEo*  its  phase  and 

group  delay  functions  can  be  readily  calculated  by  setting 
a  =  0  in  the  expression  (6)  and  (8).  It  is  easy  to  verify  that 
the  phase  function  of  Hr(z)  is 


6f(  to)  =  arctan 


(  kuJ\ 


(9) 


The  corresponding  group  delay  function  is 


Tyr  1 

,  .  rrQ rpMx 
xtPf(oj)x 

QrFH  - 

\(PrFH  +  PrFH) 

PrFH  = 

pF(u)r  f(u) 

PfH  = 

CfHCfH  +  SfHSfH) 

TfH  = 

diag{0, 1, 2, ...,  N  —  1} 

Cf(u>)  = 

[1,  cos  u>,  ...,  cos(N  -  l)o>] 1 

Sf(oj)  — 

[0,  sinw,  ...,  sh\(N  -  l)u>]7 

(10) 


As  in  the  digital  Laguerre  filter  case,  the  group  delay  func¬ 
tion  of  an  N  tap  FIR  filter  can  be  expressed  as  a  ratio  of 
two  quadratic  functions  in  the  filter  coefficient  vector  x. 

Remark:  Note  that  an  HR  filter  is  the  ratio  of  two  FIR 
filters.  Therefore,  the  group  delay  function  of  a  given  HR, 
filter  can  be  expressed  as  the  difference  between  the  group 
delay  function  of  the  numerator  and  that  of  the  denomina¬ 
tor. 


3.  Frequency  Domain  Digital  Filter  Design  with 
Magnitude  and  Group  Delay  Constraints 

It  has  been  advocated  [1],  [8]  (at  least  in  the  context  of  lin¬ 
ear  phase  FIR  filter  design)  that  minimizing  the  L2  norm 
of  a  suitably  chosen  error  function  subject  to  relevant  peak 
constraints  is  one  of  the  most  meaningful  approaches  for 
filter  design.  In  the  following,  we  adopt  this  philosophical 
point  of  view  in  the  formulation  and  solution  of  our  filter 
design  problem  as  a  constrained  L2  space  nonconvex  opti¬ 
mization  problem. 
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3.1.  Problem  formulation  and  conversion 

To  start  with,  let  us  consider  those  filters  whose  frequency 
responses  can  be  expressed  as  a  linear  combination  of  a 
finite  set  of  the  orthonormal  Laguerre  functions  4>P  s.  That 
is 


N-l 

Hl(z)  —  ^  Xk<j>k(z)  (11) 

k~Q 


Without  loss  of  generality,  We  consider  the  following  con¬ 
strained  filter  design  problem,  denoted  as  problem  (P) 


i-f 

2tt  / 

J  —  7T 


$(uj)\Hl(u)  -  Hd(u)\adu 


(12) 


subject  to 

\HL(ejul)  -  Hd(oj)\  <  5m,  |t3(oj)  -  rd(o;)[  <  Sg,  uj  €  Qp 

where  <F(a>)  is  a  weighting  function,  Hd(<jj)  the  desired  fre¬ 
quency  response  to  be  modelled/approximated,  Td  the  de¬ 
sired  group  delay  function,  and  flp  and  fis  are  the  sets  of 
passband  and  stopband  frequency  points.  Note  that  in  time 
domain  (12)  is  equivalent  to  minimizing  the  mean  square 
(MMS)  error  between  the  outputs  of  the  desired  and  the  de¬ 
signed  systems  driven  by  the  same  random  input  sequence 
with  $(oi)  as  its  power  density  spetral  function.  To  sim¬ 
plify  problem  (P),  we  note  that  the  objective  function  can 
be  written  as 


Hd\2  duj  —  ~xtH/x  —  bjx  +  Cf 


(13) 


To  summarise,  using  (8),  (13)  and  (16)  the  optimization 
problem  (P)  can  be  written  as 

va.m{\xTHfX  -  bjx}  (17) 

x  2 

subject  to 

a t(ui,9)x  <c(u,6),  V(cu,  6>)  G  Qp  x  [0,  27t)  (18) 


xtQt{uj)x 

xtP(lj)x 


<S9, 


Vw  £  flp 


(19) 


Remark:  Due  to  the  group  delay  constraint,  problem  (P) 
is  a  general  nonlinear  optimization  problem.  This  means 
that  only  local  optimal  solutions  (if  it  exists)  can  be  ex¬ 
pected.  However,  our  numerical  simulation  experience  in¬ 
dicate  that  good  choice  of  starting  point  can  lead  to  very 
satisfactory  solution.  For  instance,  since  the  objective  func¬ 
tion  is  convex  and  the  magnitude  constraints  are  linear,  one 
way  to  obtain  a  good  solution  is  to  choose  a  starting  point  as 
the  solution  to  the  optimization  problem  (17)-(18)  which  is 
a  quadratic  programming  problem.  Alternatively,  the  lin¬ 
earization  approach  in  [3]  can  also  be  utilized  to  provide 
such  a  starting  point  for  solving  the  general  nonlinear  opti¬ 
mization  problem. 

3.2.  Suboptimal  Approach 

To  simplify  the  group  delay  constraints,  we  consider  the 
case  when  \H(e*u)\2  «  \Hd(u>)\2  in  the  passband  (note  that 
this  can  be  guaranteed  by  the  magnitude  constraint  with 
sufficiently  small  Sm).  It  can  be  proved  [10]  that 

\HL(en\2  =  7  2(u,)(xTP(to)x) 


where 

Hf  =  -  f  $(w)\H(eju)\2dw 

n  J-r r 

bf  =  l  f  K{${u)HL{e2“)Hd{u)}du 

Cf  =  h  [ 

J  —TT 

Let  us  now  consider  the  linearisation  of  the  magnitude  con¬ 
straint.  For  any  complex  number  z  =  £  +  jr/,  from  the 
rotation  theorem  [7]  we  know  that 

|z|  =  max  ^{ze-^},  (14) 

O<0<2tt 

Hence,  the  magnitude  inequality  constraint  can  be  written 
as 

3?{(tfL(eH  -  Hd(u,))eje}  <  Sm,  V0  e  [0,  2tt)  (15) 

It  is  easy  to  verify  that  (15)  can  be  written  as 

ar(u>,  9)x  <  c(u>,  6)  (16) 

where  a (w,0)  =  Sft{0(w)e-’9},  41  =  [</>o, <pi,  — , 4>n-i]t  and 
c(u,9)  =  R{Hd(ia)e:'0}  +  5m.  Constraint  (16)  is  linear  and 
continuous. 


This  means  that  in  the  passband  we  have, 

Therefore,  the  group  delay  function  can  be  written  as 
r9(w)  «  xTQa(uj)x 

where  Qa(w)  =  •  This  approximation  con¬ 

verts  the  group  delay  constraint  into  a  quadratic  constraint 
which,  according  to  our  experience,  can  be  handled  much 
more  easily  using  existing  optimization  subroutines  such  as 
constr.m  in  Matlab’s  Optimization  Toolbox  [4].  This  sug¬ 
gests  that,  as  an  alternative,  we  shall  consider  the  follow¬ 
ing  simplified  version  of  the  nonlinear  optimization  problem 
(p). 

Problem  ( Pb ):  Solve  the  following  optimization  problem 
for  a  discrete  set  of  {oq.}. 

min{  ^-xTH/x  —  bJx}  (20) 

x  2 

subject  to 

\xT-y2(uj)P(oj)x  -  \Hd(u>)\2\  <  ep 
| XTQa(uj)x  —  Td  |  <  6g 

This  is  a  quadratic  problem  with  quadratic  constraints. 
Specific  methods  exist  [9]  for  solving  such  problems. 
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4.  Numerical  Examples 

For  illustration,  consider  the  design  of  a  36th  order  La- 
guerre  filter  to  approximate  the  desired  frequency  response 
Hd(u)  =  Hr(u>)e where  H,  (u>)  is  the  frequency  re¬ 
sponse  of  the  noncausal  and  zero  phase  raised  cosine  fil¬ 
ter  defined  by  Hr{uj )  =  T  for  <  (1  —  a)n/T,  Hr{u>)  — 
j  (l  —  sin(^-(u;  —  ^))  for  (1  —  n)n/T  <  uj  <  (1  +  o)7r/T, 
and  Hr(u})  =  0,  elsewhere,  is  the  desired  group  delay 
constant.  Choosing  a  =  0.35,  N=36  and  a=0.25,  r,/  =  14, 
8m  =  8g  =  0.2,  we  solve  the  filter  design  optimization 
problem  (Pb).  The  magnitude,  group  delay  and  impulse 
response  are  depicted,  respectively,  in  Figs.  1-3.  The  fig¬ 
ures  indicate  very  satisfactory  design  results. 


Figure  1.  Plot  of  magnitude  response  of  the  designed  La- 
gucrre  filter. 


Figure  2.  Plot  of  group  delay  function  of  the  designed  La- 
guerre  filter. 

5.  Concluding  Remarks 

We  have  shown  that  the  group  delay  function  of  a  given 
Nth  order  digital  Laguerre  filter  can  be  written  as  a  ratio 
of  quadratic  functions  with  respect  to  the  filter  coefficients. 
We  have  posed  a  filter  design  problem  as  a  constrained 
L'2  space  minimization  problem  with  both  magnitude  and 
group  delay  constraints.  The  approach  is  practically  rele- 


Figure  3.  Plot  of  impulse  response  of  the  designed  Laguerre 
filter. 

vant  and  simulation  results  demonstrated  the  usefulness  of 
our  theoretical  results  and  proposed  solution  methods. 
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ABSTRACT 

Combined  spatial  and  time-frequency  signatures  of  signal 
arrivals  at  multi-antenna  receivers  has  recently  been  used  for 
effective  nonstationary  interference  suppression  in  broadband 
communication  platforms.  This  paper  presents  performance 
analysis  of  subspace  projection  array  processing  techniques  for 
suppression  of  frequency  modulated  (FM)  jammers  in  GPS 
receivers.  The  FM  jammers  are  instantaneous  narrowband  and 
have  clear  time-frequency  (t-f)  signatures  that  are  distinct  from 
the  GPS  C/A  spread  spectrum  code.  The  paper  assumes  that  the 
spatial  signature  of  the  jammer  is  accurately  estimated,  but  its 
instantaneous  frequency  (IF)  estimate,  which  provides  the  basis 
for  construction  of  the  jammer  subspace  exhibits  zero-mean 
independent  Gaussian  errors.  Simulation  results  comparing  the 
effects  of  IF  errors  in  a  single  and  multi-antenna  GPS  receivers 
are  provided. 


1.  INTRODUCTION 

Recently,  subspace  projection  techniques  based  on  time- 
frequency  distributions  and  bilinear  transforms  have  been 
devised  for  non-stationary  FM  interference  excision  in  direct- 
sequence  spread-spectrum  (DSSS)  communications  using  a 
single  and  multi-antenna  receivers  [1][2][3][4].  These  techniques 
exploit  the  jammer  time-frequency  signatures  and  rely  on  the 
distinct  differences  in  the  time-frequency  localization  properties 
between  the  jammer  and  the  spread  spectrum  signals.  The 
jammer  instantaneous  frequency,  whether  provided  by  the  time- 
frequency  distributions  or  any  other  IF  estimator,  is  used  to 
define  the  temporal  signature  of  the  interference,  which  is  in  turn 
used  to  construct  the  interference  subspace.  The  respective 
projection  matrix  is  used  to  excise  the  jammer  power  in  the 
incoming  signal  prior  to  correlation  with  the  receiver 
pseudorandom  noise  (PN)  sequence.  The  result  is  improved 
receiver  signal-to-interference-plus-noise  ratio  (SINR).  The  use 
of  multi-antenna  array  further  improves  the  performance  by 
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increasing  the  dimension  of  the  available  signal  subspace.  It 
allows  both  the  distinctions  in  the  spatial  and  temporal  signatures 
of  the  GPS  signals  from  those  of  the  interferers  to  play  equal 
roles  in  suppressing  the  jammer  with  a  minimum  distortion  of  the 
desired  signal[l][5]. 

As  the  jammer  subspace  is  solely  determined  by  the  jammer  IF, 
reliable  estimation  of  the  IF  is  important  for  FM  interference 
mitigation.  With  perturbations  in  IF,  the  GPS  receiver  anti¬ 
jamming  performance  is  degraded,  lowering  the  receiver  SINR. 
The  single  antenna  receiver  case  was  discussed  in  [2],  where  it 
was  shown  that  small  IF  estimation  errors  may  lead  to  a 
significant  decrease  in  the  receiver  SINR.  This  paper  analyzes 
the  multi-antenna  GPS  receiver  performance  in  the  presence  of 
zero-mean  identical  and  independent  Gaussian  IF  estimation 
errors.  Accurate  estimates  of  the  jammer  spatial  signatures  are 
assumed,  and  as  such,  exact  values  of  the  cross-correlation 
coefficients  between  the  signal  and  the  jammers  are  used.  In 
comparing  the  single  and  multi-antenna  cases,  the  paper  shows 
that  the  employment  of  antenna  arrays  can  improve  the  receiver 
SINR  effectively  by  exploiting  the  difference  in  spatial 
signatures. 


2.  SUBSPACE  PROJECTION  ARRAY 
PROCESSING 

Once  the  IF  of  the  jammer  is  estimated  from  the  t-f  domain,  or  by 
using  any  other  appropriate  IF  estimator,  the  jammer  vector  can 
be  constructed  up  to  an  ambiguity  in  a  constant  complex 
amplitude.  Subspace  projection  techniques  perform  jammer 
suppression  by  projecting  the  received  data  onto  the  jammer’s 
orthogonal  subspace.  Herein,  the  focus  is  on  jamming  of  GPS 
receivers. 

In  GPS,  the  PN  sequence  of  length  P  (1023)  repeats  itself  Q  (20) 
times  within  one  symbol  of  the  50  bps  navigation  data  [6]  [7]. 
Discrete-time  form  is  used,  where  all  the  signals  are  sampled  at 
the  chip-rate  of  the  C/A  code.  Now,  consider  an  antenna  array  of 
N  sensors,  and  a  communication  channel  restricted  to  flat-fading. 
In  the  proposed  interference  excision  approach,  the  PNQ  sensor 
output  samples  are  partitioned  into  Q  blocks,  each  of  P  chips  and 
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PN  samples.  The  jammer  can  be  consecutively  removed  from  the 
20  blocks.  This  is  achieved  by  projecting  the  received  data  in 
each  block  on  the  corresponding  orthogonal  subspace  of  the 
jammer.  The  jammer-free  signal  is  then  correlated  with  the 
replica  PN  sequence  on  a  symbol-by-symbol  basis.  Subspace 
projection  within  each  block  is  first  considered.  The  array  output 
vector  at  the  k A  sample  is  given  by 


x(k)=xs(Jc)+xu^)+w(k) 

=  p(k)h+A^fPu(k)a  +  v/(k) 


(1) 


where  xs,  xu,  and  w  are  the  signal  spreading  code,  the  jammer  and 
the  white  Gaussian  noise  contributions,  respectively,  h  is  the 
signal  spatial  signature,  and  p(k)  is  the  spreading  PN  sequence. 
The  jammer  is  considered  as  instantaneously  narrowband  FM 


signal  with  constant  amplitude  u(k)  =  ^=exp [j<t>(k)]  (jammer  is 


normalized  within  each  block).  A  and  a  are  the  jammer 
amplitude  and  spatial  signature,  respectively.  Furthermore,  the 
spatial  channels  are  normalized  and  it  is  presumed  that 

||h|*  =N  and  ||a|£  =  N  ,  where  ||||*  is  the  Frobenius  norm  of  a 
vector.  The  noise  vector  w(k)  is  zero-mean,  temporally  and 
spatially  white  with 


£[w(yt)wr(t  +  /)]  =  0,  E[y/(k)Y/H  (k  +  l)]  =  a7S(l)lN  (2) 


as  the  GPS  C/A  code.  yt  is  deterministic  due  to  the  fact  that  the 
spreading  code  for  each  satellite  signal  is  fixed  and  periodic. 
The  decision  variable  is  the  real  part  of  the  sum  of  the 
correlation  output  over  Q  blocks. 

3.  EFFECTS  OF  IF  ERRORS  ON  THE 
PROJECTION  OPERATION 


Errors  in  IF  may  occur  in  many  situations,  where  it  becomes 
difficult  to  determine  the  IF  due  to  a  drop  in  the  jammer  power, 
presence  of  amplitude  modulations,  or  high  levels  of  cross-terms 
in  the  t-f  domain.  When  IF  estimation  errors  exist,  the  subspace 
projection  operation  will  not  entirely  remove  the  jammer.  The 
un-excised  residual  jammer  at  the  projection  filter  output  is  often 
significant,  specifically  for  high  JSR.  In  this  paper,  the  phase 
error  model  is  a  zero-mean  Gaussian  white  noise  process, 
motivated  by  the  fact  that  phase  errors,  directly  obtained  from  the 
analytic  signal  of  FM  in  complex  Gaussian  additive  noise,  have 
wrapped  normal  distributions  [8],  For  high  jammer  power,  the 
distribution  variance  becomes  very  small  and  the  phase  errors 
assume  a  Gaussian  distribution. 

The  estimated  unit  jammer  vector  can  be  represented  as 


u7  =  _*_|V(*(I>+A<i» 


ej»(2)+a(2)) 


P))  |  (9) 


where  c2  is  the  noise  power,  and  Im  is  the  NxN  identity  matrix. 
Using  P  sequential  array  vector  samples  within  the  block,  the 
following  PNxldata  vector  for  one  block  is  obtained: 

X  =  [xr(l)  xr(2)  ...  xr(P)f  =  XJ+XU+W.  (3) 

The  spatial-temporal  signature  Xs  can  be  rewritten  as 

X,  =p  ®h  (4) 

where  p=[p(l),  (2),...  p(P)]  is  the  signal  in  one  block,  and  ® 
denotes  the  Kronecker  product.  In  the  same  way,  the  jammer 
vector  X„  is  expressed  as 


The  phase  estimation  errors  A(i)  at  different  chips  are  assumed  to 
be  i.i.d  random  variables  with  a  zero  mean  Gaussian  distribution 

and  variance  erf  .  The  variance  of  is  assumed  to  be  sufficiently 
small  such  that  most  errors  lie  inside  the  interval  [-it,7t].  The 
projection  matrix,  constructed  from  the  inaccurate  jammer 
vector,  is 

*  I  A  » 

V  =  I„--UU"  (10) 

N 

where  U  =  u  ®a  .In  this  case,  the  output  of  the  correlator  in  one 
block  is 


Xu  =  Aj?u®ni  =  A-JPV  (5) 

where  u=[u(l),  u( 2),...  u(P)]  is  the  jammer  normalized  vector, 
and  U  is  the  spatial-temporal  signature.  Its  orthogonal  subspace 
projection  matrix  is  given  by 

V=I/JV-U(U',U)-'U,,=Ii„--i-UUw  (6) 

With  the  exact  knowledge  of  the  jammer  IF,  the  projection  of  the 
received  signal  vector  onto  the  orthogonal  subspace  yields 

Xx  =  VX=VXJ+VW  (7) 

which  excises  the  jammers.  The  result  of  despreading  over  one 
block  is 

y=XfXx  =  Xf VXj  +X?VW  A  (8) 

where  yj  and  y2  are  the  contributions  of  the  PN  and  the  noise 
sequences  to  the  correlator  output,  respectively.  For 
simplification,  the  jammers  are  assumed  to  share  the  same  period 


y=X?VXJ+X?VW+X?VXu  A  (11) 

where  y;,  y2  and  y3  represent,  respectively,  the  contributions  of 
the  spreading  code,  the  noise  sequence,  and  the  interfering  signal. 
Due  to  phase  estimation  errors,  these  three  variables  are  random 
which  renders  equation  (11)  different  from  its  deterministic 

counterpart  equation  (8).  Since  the  projection  matrix  \k  is 
Hermitian,  yt  is  always  real.  The  mean  value  of  y ,  is 

£[y1]  =  £[XfVXJ=/’A/-^XfUUwXJ]  (12) 

Define 


as  the  spatial  cross-correlation  coefficient  between  the  signal  and 
the  jammer,  and 
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P  =  v£,  (14) 

Vp  VP 

as  the  exact  and  estimated  temporal  cross-correlation  coefficient 
between  the  PN  sequence  and  the  jammer  vector,  respectively. 
Therefore, 

Xf  U = (p  ®  h)w (ii  ®  a)  =  (p^uXh^a)  =  JPNap  (15) 

It  can  be  readily  shown  that 

£[uu"  ]  =  u  u"  +-i(l  -  e°'  )IP  (16) 


Hence, 


E[yi ]  =  PN(\ -\ocf  ])  =  PN(  1  -| af  ^-]) 


=  PN 


i-H2  ^\p\2+^ 


With  the  noise  assumption  in  (2), 

E[y2]=0 


(17) 


(18) 


Similar  to  y  i,  the  mean  value  of  y3  is  obtained  as 

£[y3]  =  £[X?VX,]  =A-Jp(x?V  --i;£[Xf  UUwU]^ 

=  ^VP^XfU— irW2a£Ipruu"u]j  (19) 

=  ANaPQ-  e~a‘)(P-  1) 

From  ( 1 7)-(  1 9),  the  mean  value  of  y  can  be  calculated  by  the  sum 

£[y]=£[y,]+P[y2]+£Iy3]  (20) 


The  mean  square  value  due  to  the  signal  is 


p[|y,|2]=p[xfvxI^vxJ] 

=£[P2W2  -2PXf UU^X,  +-^-Xf  UU"x,  XfUUHXs] 

N 

=P2N 2  -2W2P|a|2£[p7uu*p]+lV2|a|4  £[pruuffpp7'uuwp](21) 

=P2W2-2W2P|a|2(e^P|0|2  +  l-e^£V 

N 2  |ce|4[(2+4P|)3|2  e°‘  y\-e^)  + P2^  e1^] 


In  deriving  the  above  expression,  pru  is  approximated  by  a 
complex  Gaussian  random  variable  using  the  Central  Limit 
Theorem.  From  (17)  and  (21),  the  variance  ofyi  can  be  computed 
as 


<  =E\y]]-E1[yl)  (22) 


Hlhf  1*=°3,  =£[XfVWW"vXJ 
=  £[£[Xf  WW*  VX,  IV]] 

=  C72£[XfWXJ  (23) 

=  a2£[XfVXJ=CT2£[y,] 


G2PN 

1-H2 

,  J- 

£[|y3|2]  =£[Xf  V^XfVX„]  =  ^2P£[Uw\^XVU] 

=  A2PEIVHXS  XfU 1 UWUU  %  Sf  U 
N 

~—vHxst  vvHv+-^rvHvvItxs^i}vHv] 

=  yf2P{W2Pjot|2 1 pf  -  N2  |a|2 -Jp pE[nH iuHp]  (24) 

-W2|a|2  VP/3*£lpwuuwu]+iV2  lafislu^uu^pp^uu^u]} 

=  A 2  JV2P|ct|2  {P|j3|2  -  2P|)3|2  [e^  +  (1  -e^  )/P] 

+[e~°‘  +(\-e^)/P](P\pfe^  +  \-e~a‘)} 

Therefore,  the  variance  of_y3  is  given  by 

ff2  =£[|y3f]-|Pb'3f  (25) 

It  can  be  shown  that  the  covariance  between  y  and  y  (i*j), 
ij=l,2,3)  assume  small  values  relative  to  the  respective  variance 
values.  The  variance  of  the  decision  variable  can  be 
approximated  by 


The  above  equations  are  derived  for  only  one  block  of  the  signal 
symbol.  Below,  the  subscript  m  is  added  to  identify  y  with  block 
m  (m=l,  2, . ..,  Q),  and  should  not  be  confused  with  those  used  in 
(11).  The  decision  variable  yr  can  be  expressed  as 

y,  =M2>J  (27) 

m=l 

Since  the  white  Gaussian  and  estimation  errors  are  independent 
for  different  blocks,  the  expected  value  and  variance  of  yr 
become 


£[yr]=^Re[£[yml],  Fflr[y,] -JvarDvJ  (28) 

m= 1 

The  above  expressions  can  now  be  used  to  generate  the  desired 
receiver  SINR, 

SJNR=-pri  (29) 

Var[yr] 


The  mean  square  values  of  y2  and  y3  are  given  by 


363 


4.  SIMULATIONS 

Computer  simulations  of  the  receiver  performance  are  presented 
based  on  the  following  parameters:  the  PN  code  is  the  Gold  code 
of  satellite  SV  #1,  and  the  jammer  is  a  periodic  linear  FM  signal 
with  normalized  frequency  range  [0, 7t].  The  jammer  has  a  period 
equal  to  the  C/A  code  block  length.  The  angle  of  arrival  (AOA) 
of  the  satellite  signal  is  A  two-element  array  is  considered 
with  half-wavelength  spacing.  The  Jammer-to-Signal-Ratio 
(JSR)  is  set  to  50  dB  and  SNR  equal  to  -20  dB. 

Figure  1  depicts  the  simulated  values  of  the  receiver  SINR  vs.  the 
phase  error  variance  cs\  ,  which  changes  in  the  range  [0,  0.01]  for 

all  blocks.  The  AOA  of  the  jammer  signals  are  set  to  be  5°,  35°, 
and  65°,  respectively.  It  is  clear  from  the  figure  that,  as  the  error 
variance  increases,  the  output  SINR  decreases.  The  SINR  of  the 
single  sensor  case  is  also  plotted  for  comparison.  Unlike  the 
result  of  exact  IF  estimation,  where  antenna  arrays  bring  a 
constant  3  dB  array  gain  [1],  the  receiver  SINR  in  the  presence  of 
those  errors  is  dependent  on  the  spatial  signatures  of  the  signal 
and  jammer.  For  small  spatial  cross-correlation  coefficients,  the 
use  of  antenna  array  allows  the  receiver  to  be  more  robust  to  the 
IF  estimation  errors.  The  relation  between  the  receiver  SINR  and 


:  i 


the  jammer  AOA  is  shown  in  Fig.  2.  In  this  case,  phase  error 
variance  er*  in  Fig.  2  kept  constant  at  0.01 . 


5.  CONCLUSION 

Subspace  projection  is  a  pre-correlation  technique  that  can  reject 
the  instantaneous  narrowband  interference  effectively.  Using 
antenna  arrays  can  provide  further  improvement  in  the  receiver 
SINR  performance  by  exploiting  both  temporal  and  spatial 
signatures.  However,  as  the  subspace  projection  matrix  is  solely 
dependent  on  the  IF  estimation,  IF  estimation  errors  will  perturb 
the  projection  matrix  and  allow  part  of  the  jammer  power  escape 
the  projection  operation. 

In  this  paper,  the  SINR  performance  of  GPS  receiver  using  array 
subspace  projection  in  the  presence  of  IF  estimation  errors  has 
been  analyzed.  The  phase  errors  are  modeled  as  zero-mean  white 
Gaussian,  and  independent  over  different  chips.  The  spatial 
signature  is  assumed  to  be  accurately  estimated.  The  analysis 
and  simulation  shows  that,  although  the  IF  estimation  errors  can 
affect  the  receiver  SINR  significantly,  the  combination  of 
temporal  and  spatial  signature  can  provide  more  robustness  in  the 
presence  of  IF  estimation  errors,  and  as  such,  render  better 
performance  than  the  single  antenna  scheme. 
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ABSTRACT 

In  radio  telescope  arrays,  the  complex  receiver  gains  and  sensor 
noise  powers  are  initially  unknown  and  have  to  be  calibrated. 
Gain  calibration  enhances  the  quality  of  astronomical  sky  im¬ 
ages  and  moreover,  improve  the  effectiveness  of  certain  radio 
telescope  phased-array  data  processing  techniques,  such  as  radio 
interference  (RFI)  mitigation  and  beamforming.  In  this  paper  we 
present  several  closed  form  and  iterative  complex  gain  estimation 
methods.  These  methods  are  analyzed  and  compared  to  the 
Cramer-Rao  lower  bound  for  the  variance  of  the  estimated  gain. 
The  models  are  tested  both  on  simulated  data  and  on  observed 
telescope  data. 


sky  source 


Fig.  1.  Radio  telescope  array 


Keywords:  applications  in  radio  astronomy,  sensor  array  process¬ 
ing. 

1.  INTRODUCTION 

Gain  calibration  techniques  for  radio  telescope  systems  exist  al¬ 
ready  for  a  long  time  [1][2],  However,  since  studies  started  for 
a  next  generation  of  radio  telescopes  (the  Square  Kilometer  Ar¬ 
ray  radio  telescope  or  SKA  [3]),  phased  array  beamforming  issues 
and  radio  frequency  interference  (RFI)  suppression  techniques  re¬ 
ceived  renewed  interest  [4]  in  radio  astronomy.  For  RFI  suppres¬ 
sion,  and  for  phased  array  beamforming,  gain  calibration  of  the 
telescope  array  is  an  important  factor.  Maximum  likelihood  tech¬ 
niques  exist  for  estimation  of  the  gain  and  phase  of  signals  imping¬ 
ing  on  the  telescope  arrray  [5]  and  for  estimation  of  the  direction 
of  arrival  of  the  impinging  signals  [6].  For  computational  reasons 
(SKA  will  have  many  sensor  elements)  and  for  robustness  reasons 
(iterative  maximum  likelihood  techniques  depend  on  a  good  initial 
point)  we  investigated  several  closed  form  and  iterative  complex 
gain  estimation  methods  and  found  that  these  techniques  perform 
well. 

The  complex  gains  and  noise  powers  of  individual  telescopes 
of  a  telescope  array  (figure  1)  can  be  estimated  by  observing  a 
strong  astronomical  source  in  the  centre  of  the  field  of  view  of  the 
telescopes.  In  most  cases,  single  point  sources  can  be  found  which 
dominate  the  field  of  view  of  a  radio  telescope.  A  telescope  out¬ 
put  signal  is  the  sum  of  the  telescope  system  noise  (uncorrelated 

A.J.  Boonstra  was  supported  by  the  NOEMI  project  of  the  STW  under 
contract  no.  DEL77-4476. 


among  the  telescopes)  and  the  astronomical  source  flux,  which  is 
correlated,  multiplied  by  the  telescope  gain.  The  source  flux  is  the 
same  for  each  of  the  telescopes,  but  the  telescope  gains  and  noise 
powers  usually  are  not.  The  gains  consist  of  the  combined  effect  of 
atmospheric  disturbances,  telescope  geometry,  receiver  character¬ 
istics,  and  electronic  (amplifier)  gains,  whereas  the  system  noise 
powers  can  differ  by  several  dB’s. 

The  output  of  the  backend  processing  is  a  sequence  of  covari¬ 
ance  matrices  formed  by  cross  correlation  of  all  the  telescope  out¬ 
puts  Xi.  The  aim  in  this  paper  is  to  estimate  the  complex  gain 
factors  and  the  system  noise  powers  from  an  observed  covariance 
matrix,  assuming  that  the  astronomical  source  flux  is  known  from 
tables.  We  present  three  algorithms  to  extract  these  parameters. 

2.  DATA  MODEL 

Assume  that  during  the  calibration  observations  the  telescopes  are 
pointed  at  a  single  radio  source  in  the  sky.  For  a  telescope  array 
(figure  1)  the  output  x,  of  element  i  at  a  certain  time  t  can  be 
modeled  (using  the  narrow  band  assumption)  as 

Xi{t)  =  gms(t)  +  m{t)  (1) 

where  g,  is  the  complex  gain  of  the  sensor,  m  is  the  system  noise  of 
channel  i,  ai  is  the  narrow  band  phase  offset  due  to  the  geometric 
delay,  and  s(t)  is  the  flux  of  the  impinging  external  source.  For 
the  gain  calibration  observation,  the  sky  source  is  located  in  the 
centre  of  the  field  of  view.  The  geometry  and  look  direction  of 
a  telescope  is  known,  so  the  narrow  band  phase  offset  due  to  the 
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geometric  delay  is  known  as  well  and  can  be  compensated  for. 
Hence  without  loss  of  generality  we  may  assume  in  our  model  that 
the  phase  offsets  are  a,  =  1. 

In  radio  astronomy,  the  sensor  array  output 

x(<)  =  [*i(f),  •••  ,  xp(t)]'  (2) 

is  usually  correlated  with  itself  to  form  a  covariance  matrix.  Here 
the  superscript  t  means  the  transpose  operator,  and  p  is  the  number 
of  telescopes.  The  true  covariance  matrix  R  and  estimate  R  based 
on  N  samples,  assuming  stationary  over  this  interval,  are  given 
by 

R  =  £{x(f)x(f)H}  (3) 

1  A’_1 

R  =  —  ^  x(f  +  nT)x(f  +  tiT)h  (4) 

n= 0 

where  superscript  H  denotes  the  complex  conjugate  transpose. 

The  signal  power  of  —  £{|s(0|2}  is  known  from  tables, 
hence  without  loss  of  generality  we  may  model  it  as  erf  =  1.  The 
covariance  matrix  can  now  be  written  as 


where  the  matrix  U  =  [ui  •  •  •  u;,]  contains  the  eigenvectors  u,, 
and  A  is  a  diagonal  matrix  containing  the  eigenvalues  A,.  The 
gain  estimate  minimizing  (8)  is  given  by 


where  Aj  is  the  largest  eigenvalue,  and  ui  is  the  corresponding 
eigenvector.  The  second  step  is  to  minimize  (7)  with  respect  to  the 
system  noise  matrix  D,  keeping  the  gain  vector  fixed.  The  mini¬ 
mum  is  obtained  by  subtracting g(A)g(A')//  from  R  and  discarding 
all  off-diagonal  elements.  The  condition  that  the  diagonal  elements 
of  D,a  +  1  ’  should  be  positive  is  implemented  by  subsequently  set¬ 
ting  d,(k+l)  =  max(d,tA  +  1),0).  The  two  minimizations  steps 
are  repeated  until  the  model  error  (7)  converges.  Since  each  of 
the  minimizing  steps  in  the  iteration  loop  reduces  the  model  error, 
we  obtain  monotonic  convergence  to  a  local  minimum.  Although 
the  iteration  is  very  simple  to  implement,  simulations  indicate  that 
convergence  usually  is  very  slow,  especially  in  the  absence  of  a 
reasonable  initial  point. 

3.2.  Column  ratio  gain  estimation  (COL) 


R  =  ggH+D  (5) 

where  D  is  a  diagonal  matrix  containing  the  system  noise  contri¬ 
butions,  di  =  7?{|n,(t)|2}  >  0.  The  gain  vector  g  can  be  written 
as  a  product  of  a  gain  magnitude  7=  [7,  ,  •  •  •  ,  7,,]'  (7,  >  0)  and 
a  phasor  z  =  [e^1 ,  •  •  •  ,  e-70'1]';  i.e.  g  =7©z,  where  ©  is  the 
Schur-Hadamard  (elementwise)  matrix  product.  The  ?j-th  element 
of  R  is  thus  given  by 

rij  =  7,7 +  d,5ij  (6) 

Since  the  phases  are  underdetermined,  we  define  without  loss  of 
generality  the  phase  of  the  first  sensor  to  be  zero:  <f>  1  =  0.  The 
objective  at  this  point  is,  given  R,  estimate  g  and  D  according  to 
the  model  (5). 

3.  GAIN  DECOMPOSITION  ALGORITHMS 
3.1.  Alternating  least  squares  gain  estimation  (ALS) 

The  covariance  matrix  in  equation  (5)  is  composed  of  a  rank-one 
matrix  ggH  and  a  diagonal  matrix  D.  The  gain  extraction  proce¬ 
dure  is  based  on  minimizing  the  model  error: 

{g,D}=arg  min  ||R  —  D  —  ggH  ||f-  (7) 

g,D>0 

where  ||  ■  || /-  denotes  the  Frobenius  norm.  In  the  ALS  technique, 
we  altematingly  minimize  over  one  component,  keeping  the  other 
component  fixed.  In  particular,  assume  at  the  k- th  iteration  that  we 
have  an  estimate  D(A).  The  next  step  is  to  minimize  equation  (7) 
with  respect  to  the  gain  vector  only: 

g(A)  =  argmin||R-ggw  -Da)||p  (8) 

g 

The  minimum  is  found  from  the  eigenvalue  decomposition  of  R  — 
D(fc), 

R  -  Dw  =  u(<r)A(A)U(<1')H  (9) 


We  now  set  out  to  find  a  closed  form  estimate  of  g,  which  recovers 
g  exactly  when  applied  to  R  (hence  asymptotic  for  R).  The  crux 
of  this  method  is  the  observation  that  the  off-diagonal  entries  of 
ggH  are  equal  to  those  of  R.  hence  known,  so  that  we  only  need 
to  reconstruct  the  diagonal  entries  of  gg" .  This  can  be  done  in 
closed  form  by  estimating  the  column  ratios  of  R  away  from  the 
diagonal  as  discussed  below.  The  diagonal  of  the  covariance  ma¬ 
trix  R  is  then  replaced  with  the  estimate  producing  a  matrix  of  the 
form  R'  =  gg" .  The  gain  vector  g  can  then  be  extracted  by  an 
eigenvalue  decomposition  of  R'. 

The  ratio  a,  j  of  two  of  elements  g,  and  gj  of  the  complex  gain 
vector  g  can  be  estimated  from  the  data  R  by  solving 


c,  —  aijCj 


HD 


where  c,  and  c,  are  the  7-th  and  j- th  column  of  the  matrix  R. 
not  including  the  entries  r,;,  r,j,  77,  and  r_,j  because  m  and  ry, 
contain  also  the  unknown  system  noise  contributions  d, .  Solving 
for  a,j  in  the  Least  Squares  sense  gives 


,  H  H 

a,j  —  (c,  c j)  c,  Cj 


rkirkj 

rUn-i 


(12) 


We  can  subsequently  estimate  |g,j2  as  | <?,  | 2  =  R(ayry),  for  any 
choice  of  j.  This  estimate  can  be  improved  if  all  ( p  -  1)  available 
column  ratios  are  used.  The  next  step  is  to  form  R'  equal  to  R 
but  with  the  diagonal  entries  replaced  by  the  estimates  of  |g;l2  ob¬ 
tained  above.  The  resulting  matrix  R.'  is  an  estimate  of  gg  ,  and 
g  is  found  from  an  eigenvalue  decomposition  of  R'  =  UAUW, 
similarly  as  in  (9),  (10)  before.  In  the  actual  algorithm,  we  follow 
the  same  procedure  but  replace  R  by  the  sample  estimate  R. 

3.3.  Logarithmic  least  square  gain  estimation  (LOGLS) 

An  alternative  closed  form  estimate  (as  in  use  at  the  Westerbork 
Sythesis  Radio  Telescope.  WSRT  [1]  since  1980)  is  obtained  by 
minimizing  the  mean  square  error  of  the  logarithms  of  (6).  Taking 
the  logarithm  has  the  effect  that  the  equations  become  linear  as  the 
product  of  gains  become  sums.  We  start  by  taking  the  logarithm 
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of  the  off-diagonal  elements  (Vi  j)  of  equation  (6)  and  define 
the  logarithmic  model  errors  Ay  (V  i  /  j)  as 

Ay  =  ln(fy )  -  In (7,)  -  ln(7j)  -  j(<pi  -  fc)  mo&2nj  (13) 

Minimization  in  the  least  square  sense  of  the  sum-squared  error 
52  |  Ay  | 2  over  the  real  gains  and  and  phases  is  obtained  by  setting 


d 


E  iAf 


91n(7»)  4-' 

1,7  =  1 


0, 


dln(erik) 


E  IA.il2  =0  (14) 


After  some  manipulations  the  equation  for  the  gain  magnitude  (14) 
becomes: 


E  (  -  In (7.)  -  ln(7/t))  =0  (15) 

i—l 

for  k  =  1,  •  •  •  ,p.  This  equation  can  easily  be  written  in  matrix 
form  and  solved  in  closed  form  using  Woodbury's  identity.  The 
same  procedure  leads  to  a  closed  form  solution  for  the  phase.  In 
this  method,  phase  unwrapping  is  necessary.  This  is  done  by  using 
a  simple  phase  quadrant  estimation  procedure. 

4.  GAIN  ESTIMATION  SIMULATIONS 

4.1.  Method 

The  aim  of  the  simulations  is  to  evaluate  the  gain  estimation  ac¬ 
curacy  as  a  function  of  signal  to  noise  ratio  (SNR;  =  g7/g/d, ), 
i.e.  the  ratio  of  the  astronomical  source  power  (normalized  here  to 
unity)  and  array  gain  to  the  noise  power  in  the  i-th  channel. 

In  the  simulations  we  use  eight  telescope  channels.  The  gain 
magnitude  was  kept  fixed  during  the  simulations,  and  was  taken 
as  a  nominal  value  plus  a  (uniformly  selected)  random  deviation 
of  10%  of  the  nominal  value  .  The  gain  phase  was  randomly  dis¬ 
tributed  in  the  interval  [0,  27r]  and  also  kept  fixed  during  the  sim¬ 
ulations.  In  the  presentation  of  the  results,  we  split  the  gain  esti¬ 
mates  in  a  magnitude  and  a  phase,  since  they  might  have  different 
accuracies,  and  since  the  Cramer-Rao  bounds  are  based  on  these 
(real)  parameters. 

4.2.  Cramer-Rao  lower  bound  of  the  gain  estimates 

The  Cramer-Rao  Bound  (CRB)  gives  a  lower  bound  on  the  vari¬ 
ance  of  any  unbiased  estimator  [7],  In  our  situation,  we  assume 
that  the  source  signal  and  the  channel  noise  are  independent 
Gaussian  distributed  with  zero  mean,  and  satisfy  the  model  in 
equation  (5).  Define  the  parameter  vector 

6  =  [71,  •••  ,7p,(/>2,  •••  ,0P,di,  ,dPY  (Note  that  the 
phase  4>i  of  the  first  sensor  is  not  a  parameter.)  The  CRB  is  then 
given  by  [7] 

var  (£(X)|0)  >  [i;1] ..  (16) 

where  If  is  the  Fisher  information  matrix,  where  X  is  defined  as 
X  =  (x[l]  •  ■  •  x[lV]),  and  N  is  the  number  of  samples.  Follow¬ 
ing  standard  techniques  [7],  the  Fisher  information  matrix  can  be 
written  as 


lF,y(0)  =  iVtr(R“1^R 


-idRA 
98 jJ 


(17) 


Inserting  the  model  (5),  the  components  in  the  Fisher  information 
matrix  can  easily  be  found  as 


R_1 

=  D_1  (l  ^D-1  \ 

v  i+g«D-ig; 

(18) 

dR/d-fi 

=  (e;  ©  z)  gH  +g(e;  0zfl) 

(19) 

dR/dcpi 

=  j(  (g©e;)gH  -  g(g©e;)w) 

(20) 

dR/ddi 

—  e,e- 

(21) 

where  e,  denotes  the  i-th  unit  vector.  The  estimation  variance  of 
the  model  parameters  is  calculated  by  evaluating  equation  (16). 


4.3.  Comparison  of  the  gain  decomposition  methods:  simula¬ 
tion  results 

For  a  typical  online  gain  calibration  measurement  at  a  radio  ob¬ 
servatory,  astronomical  sources  are  used  with  noise  powers  in  the 
range  of  0.1  to  10%  of  the  telescope  system  noise  power.  The  in¬ 
tegration  time  of  the  correlation  data  can  be  several  seconds  to  a 
few  minutes. 

Figure  2  shows  the  results  of  a  gain  estimation  simulation  in 
which  the  gain  estimation  variance  is  plotted  versus  SNR  for  a 
fixed  number  of  samples.  The  three  models  are  plotted  together 
with  the  Cramer-Rao  lower  bound.  In  the  -10  to  0  dB  SNR  range, 
the  gain  estimation  errors  lie  very  close  to  the  CRB  (for  16  k  sam¬ 
ples)  and  the  estimators  are  unbiased.  Below  an  SNR  of  —15  dB 
the  gain  estimation  starts  to  deviate  from  the  bound.  The  ALS 
method  breaks  down  at  higher  SNR  than  the  other  two  methods. 

The  phase  estimation  tends  to  break  down  earlier  than  the  gain 
magnitude  estimation.  The  phase  breakdown  point  is  observed 
around  -16  dB,  and  is  a  bit  lower  for  the  LOGLS  method.  At 
low  SNR  some  of  the  curves  drop  below  the  Cramer-Rao  bound. 
Here,  the  estimators  are  biased. 


Gain  estimation  standard  deviation,  Nsam=16384,  Nsim=256 
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Fig.  2.  Gain  estimation  standard  deviation  versus  SNR 


In  figure  3  the  gain  estimates  are  plotted  as  a  function  of  the 
number  of  observed  time  samples  for  a  fixed  SNR.  Note  that  also 
here,  the  phase  estimators  break  down  earlier  than  the  gain  magni¬ 
tude  estimators. 
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Gain  estimation  standard  deviation,  SNR=  -10  dB,  Nsim=256 


Fig.  3.  Gain  estimation  standard  deviation  vs.  number  of  samples 

5.  EXPERIMENTAL  RESULTS 

The  gain  estimation  methods  were  tested  on  real  telescope  data. 
An  eight  channel  datarecorder  was  connected  to  the  Westerbork 
Synthesis  Radio  Telescope,  which  was  pointing  at  the  astronomi¬ 
cal  source  3C48.  Baseband  signals  were  recorded  corresponding 
to  a  sky  frequency  of  1420  MHz.  The  SNR  of  the  source  relative  to 
the  system  noise  is  — 13  dB.  A  narrow  band  is  selected  (by  means 
of  an  FFT)  and  covariance  matrices  are  derived  by  cross  correla¬ 
tion  of  the  input  sequences.  The  observed  correlation  coefficient  is 
0.055  with  a  spread  of  about  10%  due  to  the  different  gains  of  the 
telescopes.  The  gain  decomposition  algorithms  are  applied  to  the 
covariance  matrices.  Figure  4  shows  the  observed  gain  magnitude 
estimation  standard  deviation  and  the  CRLB.  The  entire  dataset  is 

Gain  estimation  using  the  astronomical  source  3C48  at  1420  MHz 
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Fig.  4.  Gain  estimation  standard  deviation  from  observations  with 
the  astronomical  source  3C48. 

used  to  obtain  a  fair  estimate  of  the  complex  gains,  these  num¬ 
bers  are  used  for  the  calculation  of  the  CRLB.  The  curves  for  the 
LOGLS  and  COLS  gain  estimations  lie  very  close  to  the  CRLB, 
just  as  is  the  case  with  the  simulations.The  ALS  however,  per¬ 
forms  not  too  well  for  SNR's  in  the  range  below  about  -15  dB;  the 


ALS  estimates  are  biased.  The  small  deviation  of  the  LOGLS  and 
COLS  curves  from  the  CRLB  curve  could  be  caused  by  the  fact 
that  for  the  CRLB  calculations  not  the  true  gains  were  used  (as 
they  are  not  known)  but  the  estimated  gains. 

6.  CONCLUSIONS 

In  our  simulations  the  three  gain  estimation  methods  do  not  differ 
much  in  performance.  The  main  difference  is  that  the  ALS  method 
for  gain  magnitude  estimation  breaks  down  a  bit  earlier  than  the 
two  other  methods.  Also,  the  phase  estimation  seems  to  break 
down  earlier  than  the  gain  magnitude  estimation.  For  1C  k  samples 
and  for  SNRs  higher  than  -15  dB.  the  estimators  are  unbiased  (for 
the  gain  distribution  used). 

In  general  the  measurement  results  support  the  conclusions 
from  the  simulations.  However,  the  ALS  gain  magnitude  estimates 
deviate  8  dB  from  the  CRLB.  The  ALS  estimator  is  biased  for  the 
SNR  of  the  observation. 

Further  research  will  focus  on  other  methods,  like  the  Gauss- 
Newton  iterative  method  [8],  on  processing  efficiency,  and  the 
methods  will  be  extended  to  multiple  sources. 
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ABSTRACT 

The  detection  and  diagnosis  of  gearbox  faults  is  of  vital 
importance  for  the  safe  operation  of  helicopters.  This  paper 
presents  a  new  approach  in  identifying  gear  mesh  signals  for 
early  and  effective  detection  of  localised  gear  faults.  Using  this 
approach,  the  gear  mesh  signal  is  identified  using  a  non¬ 
minimum  phase  autoregressive  (AR)  model  by  maximising  the 
kurtosis  of  the  inverse  filter  error  signal  of  the  model.  Sudden 
changes  in  the  error  signal  are  usually  indications  of  the 
existence  of  localised  gear  faults  in  the  monitored  gear.  It  is 
demonstrated  using  well-regarded  CH46  helicopter  aft 
transmission  test  data  that  the  approach  shows  great  promise  for 
detecting  faults  in  complex  gearboxes. 

INTRODUCTION 

To  reduce  life  cycle  cost  and  to  improve  the  reliability  and  safety 
of  helicopters,  fault  diagnosis  schemes  for  their  power 
transmission  systems  are  essential.  In  general,  vibration  diagnosis 
has  been  proven  effective  in  the  detection  and  diagnosis  of  gear 
faults.  However,  vibration  signals  generated  by  helicopter 
transmissions  are  heavily  corrupted  by  noise.  Therefore,  it  is 
essential  to  employ  advanced  signal  enhancement  techniques  to 
extract  useful  diagnostic  information  from  the  measured 
vibration  signals.  The  synchronous  signal  averaging  technique  is 
a  powerful  tool  in  extracting  the  tooth  mesh  signal  of  the  gear  of 
interest  from  the  measured  vibration  signal.  Using  the  averaged 
signal,  various  analysis  techniques  can  be  used  to  further  identify 
the  gear  mesh  signal  in  determining  the  gear's  health  condition. 

For  healthy  gears,  the  averaged  signal  is  normally 
dominated  by  the  gear  meshing  harmonics  (sinusoids  with  the 
gear  meshing  frequency  -  the  tooth  number  times  the  shaft 
rotational  frequency,  and  its  harmonics)  modulated  by  the 
rotation  of  gear  shaft  [1],  In  most  cases,  the  modulation 
waveforms  are  also  sinusoids  with  lower  shaft  orders,  i.e.,  lx 
and/or  2x  the  shaft  frequency.  In  cases  where  the  shaft  has 
multiple  gears,  the  signal  will  be  more  complex  due  to  the  cross 
gear  modulation  interaction  [2].  When  a  localised  tooth  defect 
such  as  a  tooth  crack  is  present,  the  engagement  of  the  cracked 
tooth  will  induce  an  impulsive  change  with  comparatively  low 
energy  to  the  gear  mesh  signal.  This  can  produce  some  higher 
shaft-order  modulations  and  may  excite  structural  resonances. 
For  the  purpose  of  fault  diagnosis,  this  additive  impulse  must  be 
extracted.  A  common  approach,  i.e.,  the  conventional  residual 
signal  method,  is  to  use  a  multiple  band-stop  filter  to  remove  the 
gear  meshing  harmonics  and  sometimes  their  lower  shaft-order 
modulation  sidebands  from  the  gear  mesh  signal,  resulting  in  a 


residual  signal  that  may  expose  the  low  energy  impulses.  Other 
approaches  focus  on  the  higher  shaft-order  modulation 
sidebands  or  a  structural  resonance  to  extract  the  information 
relevant  to  local  gear  faults.  However,  when  dealing  with 
incipient  fault  detection  and  cases  involving  multiple  gears  on 
the  same  shaft,  or  strong  unknown  (e.g.,  ghost  [1])  components, 
the  current  techniques  may  become  ineffective. 

The  minimum-phase  autoregressive  (MPAR)  modelling 
approach  was  studied  for  gear  fault  diagnosis  at  the 
Aeronautical  and  Maritime  Research  Laboratory  (AMRL)  [3]- 
[5].  Under  the  assumption  that  gear  mesh  signals  were  derived 
from  an  AR  system  driven  by  Gaussian  noise,  an  AR  model  was 
established  for  signals  from  the  monitored  gear  under  healthy 
conditions,  and  then  used  as  a  linear  prediction  error  (LPE) 
filter.  The  future  signals  from  the  same  gear,  under  healthy  or 
faulty  conditions,  were  processed  by  the  LPE  filter.  The  output 
prediction  error  at  the  LPE  filter  would  resemble  random  noise 
if  the  monitored  gear  remains  in  a  healthy  condition.  However, 
when  a  local  fault  (e.g.,  a  tooth  crack)  is  developed  in  the  gear, 
the  fault-affected  region  would  not  be  well  predicted  by  the  AR 
model  that  was  established  under  healthy  conditions.  Thus  the 
LPE  signal  would  reflect  the  changes  caused  by  the  fault.  It  was 
shown  that  the  AR  modelling  approach  outperforms  current  gear 
fault  diagnostic  techniques,  such  as  the  residual  signal  method. 
Using  this  method,  there  is  no  need  to  know  the  number  of  teeth 
on  the  monitored  gear,  the  characteristics  of  the  modulation 
waveforms  or  the  number  of  gears  on  the  same  shaft.  Moreover, 
it  is  not  important  whether  structural  resonances  and  additional 
modulations  are  presented  in  tire  signal,  whereas  this 
information  is  essential  to  some  of  the  current  techniques. 

However,  in  terms  of  making  an  early  detection  of 
incipient  local  gear  faults,  the  MPAR  modelling  method  can  still 
perform  unsatisfactorily.  Hence,  a  more  general  linear 
prediction  modelling  method  -  non  minimum-phase  AR 
(NMPAR)  -  has  been  studied.  Here,  we  assume  that  gear  mesh 
signals  are  derived  from  a  non  minimum-phase  AR  system 
driven  by  non-Gaussian  noise.  The  AR  coefficients  are 
estimated  by  kurtosis  maximisation  using  the  inverse  filter 
criteria.  This  technique  has  been  applied  to  the  analysis  of  the 
CH46  helicopter  aft  transmission  seeded  fault  testing  data  (6], 
with  encouraging  results. 

GEAR  MESH  SIGNAL 

A  signal  model  generated  by  the  mesh  of  a  faulty  gear  is 
presented  here.  The  model  consists  of  1)  the  gear  meshing 
vibration  with  some  amplitude  and  phase  modulation  effects 
caused  by  geometric  and  assembly  errors,  together  with  speed 
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and  load  fluctuations  of  the  gears,  and  2)  additional  modulation 
and  structure  resonant  vibration  caused  by  local  gear  fault.  The 
model  is  expressed  as: 


be  estimated  by  maximising  the  normalised  kurtosis  of  £'[»,«]. 
with  some  constraints  on  the  variance  of  e[n;a\.  In  gear 
diagnosis,  the  kurtosis  of  a  zero-mean  signal  f[n]  is  defined  by: 


M 

.V(0  =  [1  +  +  A„  +  *»(/)}  (1) 

IM=0 

+  d(t)cos(2trfrt  +  6r )  +  v(t) 

where  A,„,  f„  and  (3„,  are  the  amplitude,  frequency  and  initial 
phase  of  the  m-th  mesh  harmonic,  respectively.  Functions  a„,(t) 
and  b„,(t)  are  the  amplitude  and  phase  modulation  functions  at  the 
m-th  mesh  harmonic  respectively,  in  which  both  the  normal 
modulation  effects  created  by  shaft  rotation  and  those  produced 
by  the  fault-induced  impact  are  considered.  Function  d(t)  is  the 
envelope  function  of  the  resonant  vibration:  f  is  the  resonance 
frequency  (the  carrier  frequency)  and  0r  the  corresponding  initial 
phase.  The  last  term,  v(/),  is  considered  the  remaining  noise  after 
signal  averaging.  In  most  cases,  Eq.  (1)  represents  a  sub- 
Gaussian  signal  with  normalised  kurtosis  (see  Eq.  4)  below  2.80 
(3.00  for  a  Gaussian  signal). 

Using  the  conventional  residual  signal  method,  the  removal 
of  gear  meshing  harmonics  and  lower  shaft-order  modulation 
will  produce  a  residual  signal  in  which  the  higher  shaft-order 
modulation  and  resonance  terms  remain.  If  these  terms  are 
significant,  the  residual  signal  may  become  a  super-Gaussian 
signal.  Therefore,  the  kurtosis  as  a  measure  of  Gaussianity  of  the 
residual  signal  can  be  used  as  the  index  for  fault  detection. 

NON  MINIMUM-PHASE  AR  MODELLING 

Based  on  Eq.  (1).  it  is  assumed  that  the  gear  mesh  signal  is 
generated  by  a  NMPAR  system  driven  by  non-Gaussian  noise. 
Hence,  the  discrete  signal  of  Eq.  (1)  can  be  expressed  by 

y\ri\  =  ■  y[n  -  A  ]  +  ir[n]  •  (2) 

A=1 


The  input  random  signal  w[w],  w=l,2,  ....  A.  is  assumed  non- 
Gaussian.  zero-mean  and  higher-order  white  (i.e..  independently 
identically  distributed  -  i.i.d.),  which  contains  the  fault-induced 
part  in  Eq.  (1).  For  stability  of  non-causal  systems,  a  constraint 
for  Eq.  (2)  is 


A(z) 


^a[A]z~*  *0,  for |z|  =  1, 
*=1 


(3) 


which  means  that  the  Z-transform  polynomial  A[z]  parameterised 
by  a[k\  has  no  roots  on  the  unit  circle.  In  identifying  the  system 
in  Eq.  (2),  we  denote  e\n:  a ]  the  estimate  of  ii>[«]  obtained  by 
applying  inverse  signal  model  with  parameter  a[A]  (estimate  of 
a\k\)  to  the  signal  y[n\.  In  analogy  to  channel  equalisation 
problems,  the  identification  process  is  shown  in  Fig.  1. 

Theorems  developed  by  Tugnait  [7]  and  Shalvi  et  al  [8] 
show  that,  under  energy  constraint,  the  kurtosis  of  the  inverse 
filter  error  signal  E[n]  is  upper  bounded  by  the  kurtosis  of  true 
random  input  signal  w[n]  to  the  model  shown  in  Eq.  (2),  that  is 
K(e[>i\)  n  X(w[h])  with  equality  if  and  only  if  the  parameter 
estimate  a[A]  equals  to  the  true  parameter  of  the  system  a[A], 
A=l,  2.  .... p.  Based  on  the  theorems,  the  AR  parameters  a[k]  can 


(4) 


Fig.  1.  Blind  identification  of  an  AR  system 


As  shown  in  Eq.  (1).  the  gear  mesh  signal  is  essentially 
amplitude/phase  modulated  sinusoidals.  Maximisation  of  the 
above  kurtosis  alone  could  lead  to  an  e[/i]  with  repetitive  spikes 
at  a  period  identical  to  that  of  the  dominant  sinusoid.  However, 
the  system  to  be  identified  is  assumed  to  have  a  non-Gaussian 
random  noise  input,  where  autocorrelation  coefficients  should 
be  minimal.  Therefore,  we  need  to  put  some  restrictions  on  the 
autocorrelation  coefficients  of  the  inverse  filter  error  e[n],  i.e.. 
the  estimate  of  u  [h].  Furthermore,  constraints  on  the  power  of 
e[n]  also  need  to  be  considered  to  ensure  convergence.  These 
constraints  can  be  incorporated  into  the  cost  function  (refer  to 
[9]  for  detailed  necessary'  and  sufficient  conditions  for  a  blind 
signal  recovery).  In  our  study,  the  cost  function  is  as  follows: 

max{j(e[;?:o:])}=  Kc  -  A log l0(Cr)- p(Rr).  (5) 

a 

where  Cf  and  R  ,■  are  the  variance  and  autocorrelation 
coefficients  of  the  error  signal  £[«].  respectively.  The  logarithm 
for  C>  is  to  transform  C  to  a  similar  range  to  K.  The  constants  A 
and  p  can  be  chosen  such  that  the  decrease  of  K  and  the 
increases  of  C  and  R  are  to  be  equally  penalised  during 
optimisation. 

Obviously.  Eq.  (5)  represents  a  non-linear  cost  function.  A 
number  of  unconstrained  optimisation  algorithms,  such  as 
simplex  search,  gradient-type  methods  and  genetic  algorithms, 
can  be  used  for  this  maximisation  process.  In  our  study,  we 
employed  a  quasi-Newton  gradient-type  algorithm  with  the 
BFGS  Hessian  updating,  developed  by  Broyden.  Fletcher. 
Goldfarb  and  Shanno.  due  to  its  wide  availability;  for  example, 
in  Microsoft  Excel™  and  Matlab™  etc.  It  is  believed  that  the 
BFGS  method  is  the  most  effective  gradient-type  method  for 
general  purpose  non-linear  optimisations.  The  initial  parameters 
for  the  AR  model  can  be  obtained  using  a  MPAR  parameter 
estimation  method,  such  as  the  Yule-Walker  method.  The 
MPAR  parameters  are  then  replaced  by  the  mixed-phase 
parameters  during  iteration.  The  model  order  can  be  obtained  by 
the  Akaike  Information  Criteria  (AIC),  which  is  thought  to  be 
sufficient  for  gear  fault  detection. 

APPLICATION  TO  THE  CH46  HELICOPTER 
TRANSMISSION  SEEDED  FAULT  TEST  DATA 

The  proposed  NMPAR  technique  has  been  applied  to  the  well- 
known  CH46  helicopter  aft  transmission  seeded  fault  testing 
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data  [6]  (available  at  http^/wisdom.arl.psu.eduAVestlapd/data'). 

The  U.S.  Navy  sponsored  the  test  that  was  carried  out  in  1993  at 
Westland  Helicopters  Ltd.  in  the  U.K.  using  its  Universal 
Transmission  Test  Rig.  The  data  used  for  this  paper  were 
acquired  in  Test  #6  and  Test  #7,  where  gear  crack  propagation 
tests  were  conducted  on  the  double  helical  idler  and  the  collector 
gear,  shown  in  Fig.  2  as  Components  #8  and  #6  (both  shaded), 
respectively. 


Fig.  2.  A  diagram  of  CH46  helicopter  aft  transmission. 


IN  TEST  #6,  the  gearbox  was  tested  with  the  port  (i.e.,  the 
shaded)  double  helical  or  herringbone  idler  gear  containing  a 
spark-eroded  notch  in  the  roots  of  two  teeth  on  each  helical  gear 
at  an  identical  shaft  angle.  In  the  CH46  mix  gearbox,  there  were 
two  herringbone  idler  gears  (Component  #8  with  72  teeth) 
connecting  to  two  input  pinions  (Component  #9)  on  the  turbine 
engine  shafts,  and  two  spur  idler  pinions  (Component  #7  with  25 
teeth)  on  the  same  shafts  connecting  to  the  collector  gear.  Sensor 
#2  (at  the  port-engine-input  location)  was  chosen  for  the  purpose 
of  this  paper  because  of  its  close  vicinity  to  the  faulty  gear.  The 
data  were  acquired  when  the  cracks  had  grown  to  3~19mm  and 
2~22mm  including  the  initial  notch,  and  the  gearbox  was  running 
at  100%  power  (Record  #446  in  [6]). 

The  averaged  signal  was  normalised  to  zero-mean  and  unit 
variance  and  is  shown  in  Fig.  3(a).  The  spectral  analysis  of  this 
signal  showed  that  the  gear  meshing  harmonics  of  the  25-tooth 
spur  pinion  were  dominant.  By  removing  the  mesh  harmonics 
from  both  the  25-tooth  pinion  and  the  72-tooth  herringbone  gear, 
together  with  their  lower  shaft-order  sidebands,  the  conventional 
residual  signal  (with  kurtosis  of  3.07)  was  obtained  and  is  shown 
in  Fig.  3(b).  Obviously,  no  convincing  diagnostic  information 
can  be  found  from  the  figure  because  the  cross  gear  interactive 
components  [2]  (i.e.,  m><72  ±  n* 25;  m,  n  =  1,  2,  ...)  still 
dominated  the  residual  signal. 

Fig.  3(c,  d)  respectively  show  the  error  signals  of  a 
minimum  phase  AR(76)  model  obtained  using  the  Yule-Walker 
method  and  AIC,  and  of  a  non-minimum  phase  AR(76)  model  by 
maximising  the  cost  function  shown  in  Eq.  (5)  with  A=3  and  p=6. 
In  Fig.  3(c),  the  error  signal  was  reduced  significantly  compared 
to  the  conventional  residual  signal  shown  in  Fig.  3(b),  but,  still 
no  evidence  of  fault  could  be  found.  As  can  be  seen  in  Fig.  3(d), 
a  distinct  spike  at  about  315°  of  shaft  angle  was  detected  by  the 
NMP-AR  error  signal  with  the  kurtosis  of  9.32,  which  strongly 
indicated  the  existence  of  a  localised  gear  fault  in  either  the  25- 
tooth  spur  pinion  or  the  72-tooth  herringbone  gear. 


(a)  K=2.63 


Fig.  3.  Analysis  result  for  Test  #6  (helical  idler  gear  crack 
propagation)  with  Sensor  #2.  (a)  The  normalised  signal  of  the 
idler  gear,  (b)  the  residual  signal  with  mesh  harmonics  and 
lower  shaft-order  sidebands  (from  both  25T  and  72T  gears) 
removed,  (c)  the  AR(76)  model  error  signal  and  (d)  the  error 
signal  of  a  non-minimum  phase  AR(76)  model 

IN  TEST  #7.  a  fatigue  crack  was  initiated  using  a  spark- 
eroded  notch  in  the  root  of  one  tooth  of  the  collector  gear.  After 
a  total  of  63  hours  of  testing,  the  crack  propagated  across  the 
root  of  the  tooth  and  brought  the  test  to  conclusion  when  the 
tooth  detached  from  the  gear  [6],  The  74-tooth  collector  gear 
combined  the  drive  from  port  and  starboard  engine  inputs  (180° 
apart)  to  the  two  main  rotor  transmission  gearboxes.  At  the  ends 
of  the  collector  gear  shaft,  there  were  two  spiral  bevel  pinions 
(Component  #5  with  26  teeth)  driving  the  front  and  aft  main 
rotor  gearboxes.  Hence,  the  multiple  gears  on  the  same  shaft 
would  produce  cross-gear  interactions  [2]  as  their  tooth  meshing 
components  modulate  with  each  other. 

Fig.  4(a)  shows  the  normalised  signal  acquired  at  75% 
power  by  Sensor  #3  (Record  #1543  in  [6]  at  50.3  test-hours), 
where  the  initial  notch  (length*  depth  =  36x5mm,  rectangular 
shape)  had  grown  by  7~10mm  on  the  aft  end  of  the  notch  and 
3~6mm  on  the  forward  end.  Using  the  conventional  residual 
signal  method,  it  was  found  extremely  difficult,  if  not 
impossible,  to  remove  all  the  interactive  components  (i.e.,  »zx74 
±  nx26;  m,n=  1,  2,  ...).  Fig.  4(b)  shows  the  residual  signal  with 
the  removal  of  the  meshing  harmonics  from  both  74-tooth  and 
26-tooth  gears  and  their  lower-order  sidebands.  As  can  be  seen, 
the  residual  signal  was  still  dominated  by  some  periodic 
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components,  which  is  most  likely  due  to  the  cross-gear 
interaction.  Again,  the  residual  signal  is  of  little  value  for  fault 
detection. 

By  applying  a  minimum-phase  AR(45)  model  to  the  data, 
the  inverse  filter  error  signal  was  obtained  and  is  shown  in  Fig. 
4(c).  where  no  evidence  of  sudden  changes  can  be  found.  The 
kurtosis  value  of  the  error  signal  was  found  to  be  3.20.  Fig.  4(d) 
presents  the  inverse  filter  error  signal  produced  by  the  proposed 
approach  using  a  non  minimum-phase  AR(45)  model.  From  this 
signal,  we  can  easily  identify  two  distinct  spikes  at  about  150° 
and  330°  of  shaft  angles  respectively.  The  corresponding  kurtosis 
was  found  to  be  7.21,  which  strongly  indicated  the  existence  of 
the  tooth  cracking.  By  zooming  into  the  spikes,  we  found  they 
were  exactly  180°  apart.  This  is  because  the  monitored  collector 
gear  engaged  with  two  pinions,  i.e..  Component  #7  as  shown  in 
Fig.  2,  so  that  a  local  impact  would  be  produced  by  the  mesh  of 
the  cracked  tooth  with  each  of  the  two  pinions. 


(a)  K=1.39 


Fig.  4.  Analysis  results  for  Test  #7  (collector  gear  crack 
propagation)  with  Sensor  #3.  (a)  The  normalised  signal  of  the 
collector  gear,  (b)  the  residual  signal  with  mesh  harmonics  and 
lower  shaft-order  sidebands  (from  both  74T  and  26T  gears) 
removed,  (c)  the  MP  AR(45)  model  error  signal  and  (d)  the 
NMP-AR(45)  model  error  signal. 

CONCLUSION 

The  analysis  results  presented  in  this  paper  have  demonstrated 
the  effectiveness  of  the  proposed  non  minimum-phase  AR 
modelling  method  by  kurtosis  maximisation  for  gear  fault 


detection.  The  method  is  potentially  superior  to  the  current  gear 
fault  detection  methods  from  the  following  perspectives:  1) 
earlier  and  more  convincing  detection  of  gear  tooth  cracking:  2) 
capability  of  detecting  faults  in  complex  gearboxes,  such  as 
helicopter  transmission  gearboxes,  where  cross-gear  interactions 
are  common. 

The  proposed  technique  requires  the  choice  of  some 
coefficients,  such  as  A  and  u.  in  the  cost  function  and  the 
selection  of  non-linear  optimisation  algorithms.  In  our  study,  we 
mainly  concentrated  on  the  gradient-type  optimisation 
algorithms  and  we  chose  A  and  fJ  by  putting  all  three  terms  in 
the  cost  function  into  a  similar  scale.  For  future  work,  we  need 
to  develop  more  systematic  approaches  for  choosing  A  and  p 
and  to  test  other  popular  optimisation  methods,  such  as  neural 
network  based  non-linear  optimisation  and  genetic  algorithms, 
for  suitability  in  gear  fault  detection. 

Acknowledgment:  The  author  would  like  to  thank  Dr.  A.K. 
Wong  and  Mr.  S.A.  Fisher  of  AMRL  for  their  comments  on  the 
paper,  and  Mr.  Bill  Hardman  and  Mr.  Mark  Hollins  of  the  U.S. 
Navy  Air  Warfare  Center  for  their  support  to  this  work. 

REFERENCES 

[1]  B.  Randall,  "A  New  Method  of  Modeling  Gear  Faults," 
ASME  Journal  of  Mechanical  Design.  Vol.  104,  April 
1982.  pp.259-267. 

[2]  M.  Zacksenhouse.  S.  Braun.  M.  Feldman  and  M. 
Sidahmed.  "Toward  Helicopter  Gearbox  Diagnostics  from 
a  Small  Number  of  Examples."  Mechanical  System  and 
Signal  Processing.  Vol.  14.  No.  4,  July  2000.  p523-543. 

[3]  W.  Wang  and  A.K.  Wong.  “A  Model-based  Gear 
Diagnostic  Technique."  DSTO  Technical  Report:  DSTO- 
TR-1079.  Dec.  2000,  Australia. 

[4]  W.  Wang  and  A.K.  Wong.  "Linear  Prediction  and  Gear 
Fault  Diagnosis."  the  Proceedings  of  the  13th  International 
Congress  on  Condition  Monitoring  and  Diagnostic 
Engineering  Management.  Dec.  3-8.  2000.  Houston,  Texas, 
USA.  p797-807. 

[5]  W.  Wang  and  A.K.  Wong.  "  Autoregressive  Model  Based 
Gear  Fault  Diagnosis."  submitted  to  ASME  journal  of 
Vibration  and  Acoustics. 

[6]  "Final  Report  on  CH-46  Aft  Transmission  Seeded  Fault 
Testing."  Westland  Research  Paper  RP907  (available  at: 
http://wisdom.arl.  psu.edu/Westland/report). 

[7]  J.K.  Tugnait,  "Estimation  of  Linear  Parametric  Models 
Using  Inverse  Filter  Criteria  and  Fligher  Order  Statistics." 
IEEE  Trans,  on  Signal  Processing.  Vol.  41,  No.  11.  Nov. 
1993.  p3196-99. 

[8]  O.  Shalvi  and  E.  Weintein.  "New  Criteria  for  Blind 
Deconvolution  of  Nonminimum  Phase  Systems 
(Channels),"  IEEE  Trans,  on  Information  Theory,  Vol. 36. 
No.2.  Mach  1990.  p312-321. 

[9]  C.B.  Papadias.  "Globally  Convergent  Blind  Source 
Separation  Based  on  a  Multiuser  Kurtosis  Maximisation 
Criterion."  IEEE  Trans,  on  Signal  Processing.  Vol.  48,  No. 
12.  Dec.  2000.  p3508-3519. 


372 


A  Hyperbolic  LMS  Algorithm  for  CORDIC  Based  Realization 

Mrityunjoy  Chakraborty,  Suraiya  Pervin  and  T.  S.  Lamba 
Dept,  of  Electronics  and  Electrical  Communication  Engineering, 

Indian  Institute  of  Technology,  Kharagpur-721302,  W.B.,  India, 
email:  { mrityun, pervin }  @ece.iitkgp.emet.in 


ABSTRACT 

An  alternate  formulation  of  the  LMS  algorithm  is 
presented  by  expressing  the  mean  square  error  as  a 
convex  function  of  a  set  of  hyperbolic  variables  that  are 
monotonically  related  to  the  filter  tap  weights.  The 
proposed  algorithm  is  ideally  suitable  for  CORDIC  based 
realization  and  possesses  very  good  convergence 
characteristics  as  revealed  via  extensive  simulation 
studies. 

Key  words:  LMS  algorithm,  CORDIC  arithmetic,  Time 
varying  system. 


1.  INTRODUCTION 

The  CORDIC  algorithm  originally  proposed  by  Voider 

[1] ,  shot  to  prominence  in  last  two  decades,  as  it  offered 
scopes  for  efficient  implementation  of  a  large  class  of 
signal  processing  algorithms.  The  popularity  of  the 
CORDIC  method  can  be  traced  to  its  numerical  stability, 
efficiency  in  evaluating  trigonometric  and  hyperbolic 
functions  and  inherent  pipelinability  at  the  microlevel 

[2] ,  For  the  case  of  the  LMS-based  adaptive  filters, 
however,  the  CORDIC  based  approach  has  so  far 
remained  confined  largely  to  lattice  filters  ([3]-[4])  and 
seemingly  has  not  been  extended  to  the  transversal  form. 
This  is  because,  in  the  case  of  the  former,  the 
computations  in  each  stage  can  be  related  easily  to  a  set 
of  hyperbolic  operations,  while  no  such  direct 
hyperbolic,  or,  trigonometric  interpretation  exists  for  the 
computations  present  in  the  latter.  In  this  paper,  we 
propose  an  alternate  formulation  of  the  LMS  algorithm 
using  a  set  of  hyperbolic  variables  which  are 
monotonically  related  to  the  transversal  filter 
coefficients.  The  proposed  hyperbolic  LMS  (HLMS) 
algorithm  leads  to  a  class  of  pipelined  architectures  and 
possesses  satisfactory  convergence  characteristics  as 
demonstrated  via  simulation  studies. 


2.  A  HYPERBOLIC  FORMULATION  OF  THE 
LMS  ALGORITHM 

We  begin  by  first  considering  the  steepest  descent  search 
procedure  that  arises  in  the  optimal  FIR  filtering 
problem.  Given  an  input  sequence  x (n) ,  desired 
response  d(n)  and  an  V-tap  filter  coefficient  vector 
w=  [w0,w,,...,  ]',  the  optimal  filter  w =[w0 ,. . . 

>wv-i  ]fis  obtained  by  minimizing  the  mean  square  error 
(MSE)  function  e 2  =  E[e2(n)] ,  where  e(n)  is  the  error 
signal  at  the  filter  output  and  is  given  by  e(n)=d(n) 
-w'x(rc),  with  x(n)=[x(n),x(n-l),...,  x(n-N+l)]’ . 
The  MSE  £  is  a  convex  function  of  the  filter  coefficients 
wk ,  k  =  0,  1, . . . ,  V  -1  and  defines  the  so  called  error 

performance  surface  in  an  N+ 1  dimensional  space.  In  the 
proposed  alternative,  we  select  a  set  of  N  hyperbolic 
angles  dk ,  k  =  0,  1,. .  .,2V  — 1  and  express  each  tap  weight 
as  vv^  =sinh<9/. ,  which  defines  a  one-to-one  mapping  of 
an  iV-dimensional  w  space  to  an  A'-dimensional  0  space. 
Clearly,  the  MSE  s ,  when  expressed  as  a  function  of 

0k ,  has  a  unique  minimum  at  0k  =sinh_1w,fc ,  it  =0,1, 

1 .  Further,  the  function  sinh 0k  is  a  monotonica¬ 
lly  increasing,  continuous  function  of  6k ,  as  9k  varies 

from  -oo  to  +co,  meaning  that ,  ||-  has  the  same  sign  as 

that  of  everywhere  within  the  respective  parameter 

spaces.  In  other  words,  the  MSE  does  not  exhibit  any 
local  minimum  in  the  0  space  as  well.  We  illustrate  this 
in  Fig.  1,  where  a  2-tap  filter  is  considered  with  chosen 

MSE  £=1-2w0 -wx  +  2vt'o  +2wf  +2w0w1 .  Fig.l(a) 
shows  the  constant  MSE  contours  vs.  wq  ,  Wj  while  Fig. 
1(b)  shows  the  contours  as  functions  of  60  and  0l .  Note 
that  as  we  move  from  Fig.  1(a)  to  Fig.  1(b),  the  general 
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nature  of  the  MSE  doesn't  change  and  the  minimum  is 
mapped  from  [-4-  ,0]'  to  [0.482.0]'* . 

In  the  proposed  scheme,  a  steepest  descent  search  is 
taken  up  in  the  0  space  in  order  to  reach  0 .  The 
gradient  V0g:  is  easily  seen  to  be  given  by  V0g'=-2 
A(p-Rw),  where  w=[sinh#o,sinh01,...,sinh#Jy_1],  > 
p  =  E[x(n)d(n)] ,  R  =  £[x(n)x'(n)]  and  A  is  a 
diagonal  matrix  with  J-th  diagonal  entry  given  by 
Aj  j=cosh0j,j  =  0,  1 ....  .A'- 1 .  The  iterate  0(i)  arising 

in  the  i-th  step  of  iteration  is  then  updated  as  : 

i9(/  +  l)  =  <90')-//V„tr  \e=  m 
where  /u  is  some  appropriate  step  size.  To  move  from 
the  steepest  descent  to  the  LMS  form,  we  simply  replace 
R  and  p  by  x(n)x'(n)  and  x(n)d(n)  respectively  in  the 

expression  for  V0e~  in  order  to  obtain  an  estimate  of  the 
gradient  at  index  n.  This  leads  to  the  so  called  HLMS 
algorithm  as  follows: 

0(n  +  l)  =  0(n)  +  /j  A (n)x(n)e(n),  (1) 

At— I 

e(n)=d(n)-  Xsinh£fy.  («).*(« -4)  (2) 

k= 0 

The  HLMS  algorithm  is  particularly  suitable  for 
CORDIC  based  realization,  since  the  two  quantities: 
sinh0*(«);c(n-4)  and  cosh  0k(n)x(n-k),  4=0,1, 

,  required  for  filtering  by  and  updatation  of  the 
4-th  coefficient  respectively  can  be  computed 
simultaneously  by  engaging  only  one  CORDIC 
processor.  For  pipelined  realization,  it  may,  however,  be 
more  appropriate  to  consider  the  hyperbolic  analog  of  an 
approximate  version  of  the  LMS  algorithm,  popularly 
known  as  the  “Delayed  LMS’TDLMS)  algorithm  [5], 
where  the  filter  coefficients  at  the  n-th  index  are  updated 
using  a  past  estimate  of  the  gradient,  say,  for  index 
(n  -  L) ,  where  L  is  an  integer.  The  correction  term  in  the 
weight  update  formula  then  gets  modified  to  /jx(n-L) 
e(n-L)  and  the  resulting  L  cycle  delay  in  the  error  feed¬ 
back  path  is  used  for  retiming  purpose.  The  hyperbolic 
analog  of  the  DLMS  algorithm  can  be  easily  worked  out 
by  substituting  R,  p  and  w  in  the  gradient  expression  by 
x(n  -  L)x'  (n  -  L) ,  x( n-L)d(n-L)  and  [sinh#0(/i-L), 
sinh6>1(n-L),...,sinhAr„1(n-L)]'  respectively  and  is 
given  by 

0(«  + 1)  =  0Oi)  +  //A (n  -  L)x(n  -  L)e(n  -  L)  (3) 


The  CORDIC  algorithm  provides  an  efficient  way  of 
implementing  (2)  and  (3).  This  algorithm  essentially 
rotates  a  two  dimensional  vector  by  running  the  iterati¬ 
ons:  xM  =xi  -8;  2~‘  y, ,  yM  =  S,  2"'  xt  +  y,  and  ei+l 
=si  -Sj  arctan/?  (2“') ,  where  A;=sgn(£(  ),  /=0,1,..., 
AM,  M  is  the  required  number  of  iterations  to  perform  a 
hyperbolic  operation.  After  M  iterations,  xM  — > 
4(A0cosh(9  +  y0sinht?) ,  yM  -+k(x0  sinh0  +  yo  and 
sM  -»0  where  4=  l/n^o1cosh(arctan/?(2'' ))  is  the  so 
called  scale  factor  having  a  constant  value  for  a 
particular  machine  (x0,y0)  is  the  initial  two 
dimensional  vector,  £0=0  being  the  desired  angle  of 
rotation.  Fig.  2  shows  a  CORDIC  realization  of  a  A  tap 
DHLMS-based  adaptive  filter  which  achieves  microlevel 
pipelining  by  using  pipelined  CORDIC  processor  units 
(PCU)  [6]  and  pipelined  multipliers  (PM).  Since  the 
critical  path  delay  arising  from  the  PCU  as  well  as  from 
the  PM  amounts  to  that  of  a  single  adder/subtractor,  this 
architecture  can  indeed  process  very  high  throughput 
data,  typically  of  the  order  of  hundreds  of  megahertz. 

3.  SIMULATION  STUDIES  AND  DISCUSSION 

Unlike  the  conventional  LMS,  it  is  very  difficult  to  prove 
convergence  of  the  HLMS  and  the  delayed  HLMS 
(DHLMS)  algorithms  analytically  owing  to  the  presence 
of  nonlinearities  in  the  form  of  hyperbolic  quantities  in 
(1),  (2)  and  (3).  Both  the  HLMS  and  the  DHLMS 
algorithms,  however,  have  been  simulated  extensively  in 
the  context  of  a  wide  class  of  applications  and  promising 
convergence  results  observed  in  each  case.  In  this  paper, 
we  present  simulation  results  for  equalizing  an  AWGN 
channel  with  transfer  function  //(c)=(l  +  2c~')(l- 
0.5;“')  (1+1. 1;“')  (1-0.6;'1 ) and  noise  variance  0.077. 
The  transmitted  symbols  were  chosen  from  an  alphabet 
of  16  equispaced.  equiprobable  discrete  amplitude  levels 
with  transmitted  signal  power  of  10  dB.  A  15  tap 
equalizer  with  centre  placed  at  the  8-th  tap  position  was 
used  for  equalizing  the  channel  and  a  step  size  of  //  = 
.0004  (.00005  for  DHLMS)  was  adopted  for  weight 
updatation.  The  resulting  output  error  characteristics 
displayed  in  Fig.  3(a)  (for  HLMS)  and  in  Fig.  3(b)  (for 
DHLMS)  by  plotting  MSE  vs.  n,  confirms  satisfactory 
convergence  properties  of  the  proposed  method.  These 
two  Figures  also  represent  comparative  studies  of  the 
convergence  performance  between  LMS  and  HLMS 
(Fig.3(a))  and  also  between  DLMS  and  DHLMS 
(Fig.3(b))  algorithms.  For  this,  we  have  also  plotted  the 
variable  r/(n) ,  gives  by  the  ratio  of  the  MSE  under  LMS 
(DLMS)  to  the  MSE  under  HLMS  (DHLMS)  algorithm. 
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Fig.  3(a).  Convergence  performance  of  the  LMS  (series  1)  and  HLMS  (series  2)  algorithms, 
series  3  represents  // . 
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Fig.  3(b).  Convergence  performance  of  the  DLMS  (series  1)  and  DHLMS  (series  2)  algorithms, 
series  3  represents  q . 
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ABSTRACT 

External  calibration  and  compensation  of  analog-to-digital  con¬ 
verters  is  considered.  Two  novel  methods  are  presented.  Both 
methods  employ  multiple  calibration  frequencies  in  order  to  im¬ 
prove  the  wide-band  performance  of  the  converter.  Also,  a  dy¬ 
namic  table  indexing  is  introduced  to  further  improve  the  perfor¬ 
mance.  A  recursive  sine-wave  reconstruction  filter  algorithm  is  de¬ 
veloped  for  calibration  purposes.The  proposed  methods  are  eval¬ 
uated  using  experimental  converter  data.  Results  indicate  that  the 
dynamic  indexing  provides  good  correction  performance,  also  at 
frequencies  not  used  during  calibration.  Thus,  wide-band  calibra¬ 
tion  can  be  achieved. 

1.  INTRODUCTION 

The  need  for  broad-band  analog-to-digital  converters  (ADCs)  is 
increasing  rapidly.  In  third-generation  mobile  communications, 
for  instance,  the  broad-band  linearity  of  the  radio  receiver  ADC  is 
a  crucial  property.  It  is  a  well-known  fact  that  practical  AD  con¬ 
verters  suffer  from  various  errors,  e.g„  gain  errors,  offset  errors 
and  linearity  errors.  These  errors  stem  from  numerous  sources 
such  as  non-ideal  spacing  of  transition  levels  and  timing  jitter,  to 
mention  a  few,  and  they  contribute  to  deterioration  of  the  broad¬ 
band  performance  of  the  converter.  Several  methods  have  been 
proposed  to  externally  compensate  for  such  errors,  e.g.,[5,  8],  Ex¬ 
ternal  in  this  case  implies  that  digital  signal  processing  methods 
are  used  in  the  calibration  and  compensation  schemes,  which  op¬ 
erate  outside  of  the  actual  converter. 

The  methods  presented  in  this  paper  extend  that  of  [4],  which 
is  briefly  recapitulated  in  this  section.  The  scheme  operates  in  two 
different  modes:  calibration  and  compensation,  where  the  latter 
is  engaged  during  normal  ADC  operation  and  the  former  is  the 
process  of  analyzing  the  errors  of  the  ADC.  During  compensation 
a  compensation  table  of  size  2 N ,  where  N  is  the  number  of  bits  in 
the  ADC  output,  is  used  to  map  each  possible  converter  output  Xi 
into  a  corresponding  corrected  value  .s,;.  Thus,  the  ADC  operation 
becomes  s(t)  — >  x,  —r  Si.  This  is  illustrated  in  Fig.  1.  The 
corrected  values  {sj}  are  chosen  to  minimize  the  MSE  criterion 
in  accordance  with  [6],  where  it  is  understood  that  the  corrected 
value  Si  should  equal  the  mean  of  all  input  values  s(t)  that  yield 
the  ADC  output  n. 

Calibration  is  performed  with  a  sinusoidalreference  signal.  By 
using  optimal  sinusoid-reconstruction  filtering  [3],  the  input  refer¬ 
ence  signal  is  reconstructed  in  the  digital  domain  to  form  an  esti¬ 
mate  s(k),  as  shown  in  Fig.  2.  Hence,  the  ADC  error  character¬ 
istics  can  be  analyzed  using  s(k)  and  the  sampled  signal  x(k).  It 


AD  converter  table 


Fig.  1.  Basic  compensation  system  utilizing  table  mapping.  The 
output  of  the  ADC  is  used  as  address  into  the  compensation  table 
to  produce  a  compensated  value. 


should  be  noted  that  the  calibration  is  implemented  without  any 
reference  device,  such  as  a  “better”  ADC  or  a  digitally  generated 
reference  signal  (e.g.,  [10]).  In  [4]  only  one  single  frequency  is 
used  as  reference  signal,  and  the  results  indicate  that  the  error  cor¬ 
rection  performance  deteriorates  when  evaluated  at  off-calibration 
frequencies. 

reconstruction 

AD  converter  filter 


Fig.  2.  Calibration  system  utilizing  a  reference  signal  reconstruc¬ 
tion  filter. 


The  first  extension  of  the  proceduredescribed  above  is  the  se¬ 
quential  multi-tone  calibrator  described  in  Section  2.  In  Section  3 
the  table  indexing  is  altered  to  take  signal  dynamics  into  account. 
Finally,  in  Section  4  the  methods  introduced  are  evaluated  using 
experimental  ADC  data. 

2.  SEQUENTIAL  MULTI-TONE  CALIBRATION 

An  improvement  to  the  methods  of  [4]  is  a  recursive  sine-wave 
reconstruction  filter  algorithm  based  on  [3].  The  filter  algorithm 
is  applied  in  the  sequential  multi-tone  calibrator  presented  in  this 
section,  as  well  as  in  the  dynamic  calibrator  in  Section  3.  The 
adaptive  behavior  of  the  filter  makes  the  calibration  procedure  ro¬ 
bust  and  enables  the  employment  of  a  sequential  multi-tone  cali¬ 
bration  signal. 

Single-tone  calibration  of  ADCs  usually  results  in  peak  perfor¬ 
mance  at  or  near  the  selected  calibration  frequency.  In  the  sequen¬ 
tial  multi-tone  calibrator,  the  same  correction  table  building  proce- 
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dure  as  in  [4]  is  utilized  but  with  several  sinusoids  with  different 
frequencies  applied  subsequently.  In  other  words,  the  reference 
signal  s(t)  input  to  the  ADC  is  described  by 

s{t)  —  A  sin  (2nf(t.)t  +  4>),  (la) 

7o  0<f<f,, 

fi  h<t<  f2, 

/(*)  =  .  Ob) 

.  //■'-  1  t  >  <F-1, 

where  { f,  }  are  the  different  calibration  frequencies  and  { t,  }  are 
the  time  instants  where  the  frequency  changes.  It  should  be  un¬ 
derstood  that  neither  {fj}  nor  {fj}  have  to  be  uniformly  spaced 
sequences,  and  {fj}  does  not  have  to  be  monotonically  increasing 
or  decreasing.  Obviously,  {fj}  must  be  an  increasing  sequence. 

The  recursive  reconstruction  filter  comprises  two  parts.  The 
first  part  is  an  LMS-based  frequency  estimation,  producing  a  rough 
estimate  of  the  reference  signal  frequency.  The  second  part  is  a 
sine-wave  reconstruction  filter,  based  on  [3],  which  adapts  to  the 
reference  signal  through  an  LMS  recursion.  The  reconstruction 
filter  utilizes  the  frequency  estimate  to  ensure  rapid  convergence 
to  the  global  minimum.  See  [7]  for  a  thorough  description  of  the 
algorithms.  By  using  this  adaptive  reconstruction  filter  the  cali¬ 
bration  frequencies  {fj}  and  the  time  instants  {tj }  in  (1)  can  be 
unknown  to  the  calibration  algorithms.  Thus,  the  communication 
between  the  reference  signal  generator  and  the  calibration  algo¬ 
rithms  is  reduced  to  a  binary  control  signal  ( calibrate  or  compen¬ 
sate). 

The  calibration  of  the  ADC  is  summarized  in  Table  1 ,  where 
the  averaging  of  reconstructed  input  samples  is  implemented  as 
a  running  average.  The  outcome  of  using  several  calibration  fre¬ 
quencies  in  this  manner  is  that  the  correction  table  will  compensate 
for  the  average  error  over  all  calibration  frequencies.  During  com¬ 
pensation  the  same  compensated  value  s,  will  be  returned  for  a 
certain  ADC  output  x ;  regardless  of  the  signal  frequency.  Thus, 
true  dynamic  compensation  has  not  been  achieved,  although  the 
performance  has  been  improved  compared  with  that  of  [4]  with 
a  small  increase  in  complexity.  The  next  section  will  discuss  an 
extension  to  true  dynamic  compensation. 


Table  1.  Summary  of  the  sequential  multi-tone  calibration  proce¬ 
dure. 

1.  Initialize  the  table,  e.g.,  s,  =  x,. 

2.  Apply  the  reference  signal  s(t)  in  ( 1 )  to  the  ADC  input. 

3.  Sample  the  reference  signal  to  produce  one  sample  x(k). 

4.  Calculate  the  reference  signal  estimate  s(k)  using  the  re¬ 
cursive  reconstruction  filter. 

5.  If  the  filter  has  converged  and  x(k)  —  Xi,  update  table  en¬ 
try  Si  as  Si  — >■  />l  where  Ai(k)  is  the  number  of 

times  Si  has  been  updated  before. 

6.  Repeat  from  3  with  the  next  sample. 


reconstruction 

AD  converter  filter 
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Fig.  3.  Calibration  system  with  dynamic  table  indexing.  Q  denotes 
a  quantizer,  i.e.,  a  reduction  of  the  number  of  bits. 

3.  DYNAMIC  CALIBRATION 

Since  the  errors  sought  to  compensate  for  are  in  most  cases  fre¬ 
quency  dependent,  it  would  be  desirable  if  the  addressing  of  the 
compensation  table  involved  information  about  the  signal  dynam¬ 
ics.  Such  an  approach  is  now  introduced. 

The  scheme  utilizes  a  compound  index  I  to  address  the  table. 
For  each  sample  s{k)  of  the  input  signal,  the  ADC  maps  the  value 
into  an  output  r,{k)  where  the  index  i  =  i(k)  at  time-instant  k 
is  in  {0, 1, . . . ,  N  —  1}.  The  index  i(k)  can  be  represented  as  a 
binary  number  using  N  bits, 

i(k)  =  b\~i  (k)2*  +  bx~2(k)2^ 1  2 3 4 5 6  +  . . .  +  bo{k)2c 

=  •• 'Ma  (^)i 

where  h,\-_i  is  the  most  significant  bit  (MSB),  and  where  [-]2  de¬ 
notes  binary  representation. 

Now.  let  t{k)  be  a  binary  number  consisting  of  the  K  most 
significant  bits  in  ?(/,■),  such  as 

i{k)  =  [i>,v - i b,\ - 2  ■  ■  •6jv-k-]2  ( k ).  (3) 

Let  the  compound  index  I(k)  at  time-instant  k  consist  of  all  N 
bits  of  i(k)  followed  by  the  Mi,  previous  i-s  and  the  M ;  future  *-s. 


index  building 


Fig.  4.  Compensation  system  with  dynamic  table  indexing. 
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Fig.  5.  Typical  output  spectrum  of  uncalibrated  (left)  and  calibrated  (right)  ADC,  in  both  cases  employed  with  a  full-scale  single  sine-wave. 
The  improvement  is  given  in  terms  of  SFDR  and  SINAD. 


according  to 

I(k)  =  [i(k)  i(k  -  1)  •  •  •  i(k  -  Mb)  i(k  +  1)  •  •  •  i(k  +  Mf)]0 

(4) 

Note  that  K  in  (3)  can  be  different  for  the  various  lags  in  (4),  i.e., 
the  different  i-s  need  not  be  of  the  same  number  of  bits. 

Calibration  is  performed  in  the  same  manner  as  in  the  se¬ 
quential  multi-tone  method  (see  Table  l)using  the  compound  in¬ 
dex  I(k)  for  table  addressing  instead  of  the  ADC  output  x(k),  as 
depicted  in  Fig.  3.  Correspondingly,  the  compensation  scheme  is 
altered  to  include  the  index  I(k),  so  that  the  complete  ADC  oper¬ 
ation  becomes  s(t)  Xi  —¥  I  si,  which  is  shown  in  Fig.  4. 
The  compensation  table  is  expanded  to  include  the  2M  elements 
Sj,  where  M  is  the  total  number  of  bits  in  I. 

4.  PERFORMANCE 

Both  methods  introduced  in  this  paper  have  been  evaluated  using 
experimental  ADC  data  from  an  Analog  Devices  AD  876  10-bit 
20  MSPS  converter.  Fig.  5(a)  shows  a  typical  single  sine-wave 
spectrum  of  the  ADC  output  without  compensation  and  Fig.  5(b) 
shows  typical  results  after  compensation.  Performance  improve¬ 
ment  can  bemeasured  in  terms  of  signal-to-noise  and  distortion 
ratio  (SINAD)1  and  spurious  free  dynamic  range  (SFDR)  [2,  9], 

4.1.  Sequential  multi-tone  calibration 

The  sequential  multi-tone  calibration  method  is  evaluated  for  per¬ 
formance  at  off-calibration  frequencies,  i.e.,  frequencies  not  used 
for  calibration.  Calibration  is  done  at  seven  frequencies  dispersed 
over  the  Nyquist  band.  The  results  are  presented  in  Fig.  6.  The 
results  are  also  compared  with  single-tone  calibration  with  evalua¬ 
tion  at  off-calibration  frequencies,  and  the  comparison  is  presented 
in  Table  2. 

1  SINAD  is  introduced  in  IEEE  Std  1241  [2, 9]  and  is  equivalent  to  SNR 
in  IEEE  Std  1057  [1], 


Table  2.  Comparison  between  sequential  multi-tone  calibration 
and  single-tone  calibration  with  evaluation  at  other  frequencies 
than  the  calibration  frequency. 


SFDR  improvement  [dB] 


Multi-tone 

Single-tone 

Average 

5.0 

3.3 

Minimum 

-0.8 

-6.4 

4.2.  Dynamic  calibration 

The  performance  of  the  dynamic  compensation  system  was  tested 
for  different  index  configurations.  Let  the  notation 

(KMb  ...  ...  K-Mj ) 

implythat  the  index  is  built  starting  with  all  N  bits  of  i(k)  (indi¬ 
cated  by  the  bold  N),  followed  by  the  A'i  most  significant  bits  of 
i(k  —  1)  up  to  the  K m6  most  significant  bits  of  i(k  —  Mb),  and 
finally  the  K- 1  most  significant  bits  of  i(k  +  1)  up  to  the  K-ms 
most  significant  bits  of  i(k  +  Mf).  For  example,  (42104)  means 
that  the  index  is  the  concatenation  of  all  10  bits  of  i(k),  the  2  MSBs 
of  i(k  —  1),  the  4  MSBs  of  i(k  —  2)  and  the  4  MSBs  of  i(k  +  1). 

The  tests  were  conducted  in  the  following  manner  for  every 
index  configuration: 

1.  Reset  and  initialize  the  compensation  table. 

2.  Calibrate  the  table  using  nfca]  different  frequencies,  chosen 
at  random  in  the  Nyquist  band. 

3.  Evaluate  the  compensation  performance  at  nfeva]  different 
frequencies,  chosen  at  random  but  not  coinciding  with  any 
of  the  frequencies  used  for  calibration.  Calculate  the  SFDR 
improvement  for  each  evaluation  frequency. 

4.  Repeat  from  1.  Test  each  configuration  ?itest  times. 
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Fig.  6.  SFDR  improvement  when  calibrating  at  seven  different 
frequencies  and  evaluating  at  a  frequency  /test  not  used  for  cali¬ 
bration. 


In  the  presentation  below,  the  median  improvement  over  all 
tests  for  every  configuration  is  presented.  Also,  as  an  indication 
of  how  well  the  calibration  signals  managed  to  excite  the  com¬ 
pensation  table,  the  ‘fill  rate’  is  presented.  This  is  nothing  but  the 
percentage  of  the  table  entries  that  were  actually  altered  during 
calibration. 

The  test  illustrates  the  impact  of  different  index  configurations. 
In  this  example,  the  number  of  calibration  frequencies  ?ifcai  =  50, 
the  number  of  evaluation  frequencies  nfcva]  =  20  and  the  number 
of  tests  per  configuration  ntest  =  10.  Each  calibration  frequency 
was  maintained  for  16  384  samples.  Fig.  7  shows  the  outcome  of 
the  test  and  the  different  configurations  used. 

5.  CONCLUSIONS 

Two  methods  to  make  ADC  calibration  work  well  in  a  wider  fre¬ 
quency  range  have  been  introduced.  Both  are  based  on  look-up 
table  compensation  with  reference  signal  reconstruction  in  the  dig¬ 
ital  domain.  The  first  method,  referred  to  as  sequential  multi-tone 
calibration,  subsequently  applies  several  sinusoids  with  different 
frequencies  as  reference  signal  to  the  ADC  during  calibration.  This 
results  in  a  better  performance  in  a  wider  frequency  range.  In  the 
second  method,  referred  to  as  dynamic  calibration,  the  look-up 
table  indexing  is  altered  to  a  compound  index  consisting  of  the 
present  sample  together  with  (quantized  versions  of)  past  and  fu¬ 
ture  samples.  Through  this  scheme  the  compensation  depends  on 
the  signal  dynamics,  as  do  the  ADC  errors. 

The  proposed  schemes  have  been  evaluated  with  experimental 
AD  converter  data.  The  results  indicate  that  the  wide-band  perfor¬ 
mance,  in  terms  of  SFDR,  of  the  calibration  schemes  is  superior  to 
that  of  [4], 
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ABSTRACT 

We  develop  in  this  paper  a  (semi-)  blind  channel  estimation 
algorithm  for  space  time  (ST)  block  precoded  OFDM  trans¬ 
missions  over  frequency-selective  channels.  We  establish  that 
multi-channel  identifiability  is  guaranteed  up  to  one  or  two 
scalar  ambiguities,  when  distinct  or  identical  precoders  are 
employed  for  even  and  odd  indexed  symbol  blocks.  With 
known  pilots  inserted  before  precoding,  we  resolve  the  resid¬ 
ual  scalar  ambiguities  and  show  that  distinct  precoders  re¬ 
quire  less  pilots  than  identical  precoders  to  achieve  the  same 
channel  estimation  accuracy.  Simulation  results  confirm  our 
theoretical  analysis  and  illustrate  that  the  proposed  semi¬ 
blind  algorithm  is  capable  of  tracking  slow  channel  varia¬ 
tions  and  improving  the  overall  system  performance  relative 
to  competing  differential  ST  alternatives. 

1.  INTRODUCTION 

New  applications  such  as  high  speed  Internet  access  and  wire¬ 
less  digital  television  call  for  high  data  rate  transmissions. 
Usage  of  multiple  transmit-  and  receive-antennas  has  the  po¬ 
tential  to  increase  the  channel  capacity,  and  thus  the  max¬ 
imum  achievable  rate.  Equipped  with  Space-Time  Coding 
(STC)  at  the  transmitter  and  intelligent  signal  processing  at 
the  receiver,  multi-antenna  transceivers  offer  also  diversity 
and  coding  advantages  over  single  antenna  systems  (see  [4,6] 
for  tutorial  treatments).  But  all  these  enhancements  in  capac¬ 
ity,  diversity  and  coding  gains  can  be  realized  if  the  underly¬ 
ing  channels  can  be  acquired  at  the  receiver. 

Conventionally,  training  symbols  are  transmitted  period¬ 
ically  to  assist  the  receiver  in  acquiring  channel  state  infor¬ 
mation  (CSI),  see  e.g.,  [2]  for  ST-OFDM  systems.  How¬ 
ever,  training  sequences  consume  bandwidth  and  thereby  in¬ 
cur  spectral  efficiency  loss  especially  in  rapidly  varying  en¬ 
vironments.  For  this  reason,  blind  channel  estimators  re¬ 
ceive  growing  attention.  Relying  on  non-redundant  and  non¬ 
constant  modulus  precoding,  [  1  ]  proposed  blind  channel  esti¬ 
mation  and  equalization  for  OFDM-based  multi-antenna  sys¬ 
tems  using  cyclostationary  statistics.  For  ST-OFDM,  a  deter¬ 
ministic  blind  channel  estimator  was  derived  in  [3]  when  the 
channel  transfer  functions  are  coprime  (no  common  zeros) 
and  the  transmitted  signals  have  constant-modulus  (CM). 

In  this  paper,  we  deal  with  a  linearly  precoded  ST-OFDM 
system  with  two  transmit  antennas  and  derive  (semi-)  blind 
channel  identification  algorithms  for  frequency-selective  FIR 
channels.  With  properly  designed  redundant  precoders,  the 
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proposed  subspace-based  blind  channel  estimator  possesses 
the  following  three  attractive  features:  i)  it  can  be  applied  to 
arbitrary  signal  constellations;  ii)  it  guarantees  channel  iden¬ 
tifiability  regardless  of  the  underlying  channel  zero  locations; 
iii)  it  can  estimate  multiple  channels  simultaneously  up  to  one 
or  two  scalar  ambiguities. 

To  enable  channel  equalization,  we  also  show  how  to  re¬ 
solve  the  residual  scalar  ambiguities  using  a  minimal  number 
of  pilots  that  we  insert  before  precoding. 

Notation:  Bold  upper  (lower)  letters  denote  matrices  (col¬ 
umn  vectors);  (•)*,  (-)T  and  (-)K  denote  conjugate,  transpose 
and  Hermitian  transpose;  7M-)  stands  for  range  space;  I k  de¬ 
notes  the  identity  matrix  of  size  K  and  0  denotes  an  all-zero 
matrix  or  vector;  diag(x)  will  stand  for  a  diagonal  matrix  with 
x  on  its  diagonal;  [  ]p  denotes  the  pth  entry  of  a  vector,  and 
[■]p,g  denotes  the  (p,  <y)th  entry  of  a  matrix. 

2.  SYSTEM  DESCRIPTION 

Figure  1  depicts  the  wireless  system  considered  in  this  paper, 
where  the  ST  transceiver  is  equipped  with  two  transmit  anten¬ 
nas  and  one  receive  antenna  as  in  [4],  Prior  to  transmission, 
the  information  bearing  symbols  are  first  grouped  into  blocks 
s(n)  of  size  I\  x  1.  Two  different  linear  block  precoders  de¬ 
noted  by  the  tall  ,7  x  K  matrices  0i  and  02,  one  for  the  even 
block  indices  2n  and  one  for  the  odd  indices  2n  + 1,  are  used 
to  introduce  redundancy  ( J  >  K).  The  corresponding  J  x  1 
precoded  blocks 

s(2n)  :=  ©is(2n)  and  §(2n  +  1)  :=  02s(2n  +  1),  (1) 

are  fed  to  the  ST  encoder  M  (•)•  The  ST  encoder  takes  as 
input  two  consecutive  precoded  blocks,  s(2n)  and  s(2n  +  1), 
to  output  the  following  2  J  x  2  code  matrix: 


Si(2n)  S!(2n4-1) 

s(2  n) 

— s*(2n  +  1)' 

s2(2  ti)  s2(2n  +  l) 

s(2n  +  1) 

s*(2n) 

Each  block  column  of  this  matrix  is  transmitted  over  suc¬ 
cessive  time  intervals  with  the  blocks  §i(n)  and  s2(n)  sent 
through  transmit-antennas  1  and  2,  respectively. 

The  frequency-selective  channels  between  the  two  trans¬ 
mit  antennas  and  the  receive  antenna  can  be  modeled  as  FIR 
linear  time-invariant  filters  with  impulse  responses  h;  := 
[/ij(0), . . . ,  hi(L)\,  i  =  1,2,  where  L  is  an  upper  bound 
on  the  channel  orders  of  hi  and  h2.  Moreover,  we  as¬ 
sume  that  OFDM  modulation  has  been  deployed  to  convert 
the  FIR  channels  into  a  set  of  parallel  flat  faded  subchan¬ 
nels  (see  e.g.,  [8]  for  detailed  derivations).  Let  V \  and  X>2 
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Fig.  1.  Block  precoded  ST-OFDM  transceiver  model 


51.  Collect  the  received  data  blocks  y (n)  and  compute 

C  =  (i/JV)E^yWywW; 

52.  Determine  the  eigenvectors  u*.,  k  =  1, . . . ,  2J  -  2 K 
corresponding  to  the  smallest  2.7  -  2K  eigenvalues  of 
matrix  rE;  split  each  vector  u*  into  its  upper  and 
lower  parts  as:  u*.  =  [uj,  u J.  }T  and  form  the  matrix 


T>(u,)  := 


diag(u{.) 

diag(uj!) 


— diag(ut) 
diag(uA.) 


S3.  From  these  eigenvectors,  estimate  [h^h^]7  as  the  left 
eigenvector  corresponding  to  the  smallest  eigenvalue  of 
Q,  where  Q  is  defined  as: 


be  the  diagonal  matrices  corresponding  to  these  subchan¬ 
nels:  T>i  :=  diag[77,:(0) . .  .77,(J  -  1)],  where  Hfk)  := 
I2i=o  hi(l)e~^lk.  Considering  two  successive  received 
blocks:  y(2 n)  and  y(2 n  +  1),  let  us  define  the  super  blocks 
y(n)  and  s(n )  as:  y(n)  :=  [yT(2n),yw(2n  +  1)]T  and 
s (n)  :=  [sT(2n),s7’(2n  +  1)]T.  Letting  w (n)  be  the  ad¬ 
ditive  noise,  the  received  block  y(n)  can  then  be  expressed 
as  (see  also  [4]  for  further  details): 

y(n)  =  2>3>i2S (n)  +  w(n)  :=  /Hs{n)  +  w(n),  (2) 


where  V,  $12,  'H  are  defined  respectively  as: 


V2  ' 
-VI 


4*12  := 


0!  0 
0  ©2 


H  :  =  i2- 


When  the  channel  matrices  T>\  and  X>  ;  become  available  at 
the  receiver,  it  is  possible  to  demodulate  y (n)  with  diversity 
gains  by  a  simple  matrix  multiplication: 


z(n)=VHy(n)=  Vy2®]  ^  s{n)+Vnw(n),  (3) 


where  the  diagonal  matrix  X>12  :=  +  V*2T>2  equals 

diagElr  \Hi(ej0)\2,  •  •  • ,  E?=i  |Hf-(e>^(J-D)|2].  Eq.  (3) 

reveals  that  zero-forcing  recovery  of  s(n)  from  55(77)  requires 
the  matrices  T>\ o&i,  i  £  [1, 2],  to  have  full  column  rank.  Be¬ 
cause  the  channels  have  maximum  order  L,  X>i  2  has  at  most 
L  zero  diagonal  entries.  Hence,  the  full  rank  of  2?i20,  can 
be  always  assured  if  we  adopt  the  following  design  conditions 
on  the  block  lengths  and  the  linear  precoders: 
al)  J  >  K  +  L; 

a2)  &i,i  €  {1,2},  is  designed  so  that  any  K  rows  of  0,  are 
linearly  independent. 

Based  on  al)  and  a2),  our  objective  in  this  paper  is  to  de¬ 
velop  a  subspace-based  (semi-)  blind  multichannel  estimation 
algorithm. 


3.  (SEMI  )  BLIND  MULTICHANNEL  ESTIMATION 

At  the  receiver,  we  collect  N  received  blocks  y(n),  with: 
a3)  N  large  enough  (>  2K)  so  that  SjyS}}  has  full  rank  2 K, 
where  Sn  ■—  [s(0), . . . ,  s(7V  —  1)]. 

Under  al),  a2)  and  a3),  a  consistent  blind  channel  estima¬ 
tor  has  been  developed  in  [4,9].  We  summarize  the  resulting 
algorithm  in  the  following  steps: 


Q  :=;F[2>(ui )*,..., P(u2./_2a-)*], 


VT  0 

0  vn 


and  V 


a  tall  Vandermonde  matrix  with  [V]JJ+i  ,f/+i  =e2  j  v<i. 
An  inherent  problem  to  all  subspace  based  estimators  is  their 
relatively  slow  convergence  with  respect  to  the  number  of 
data  required.  To  facilitate  data  efficiency  and  also  enable 
tracking  of  slow  channel  variations,  a  semi-blind  implemen¬ 
tation  of  the  subspace  based  method  can  be  devised  by  cap¬ 
italizing  on  training  sequences,  which  are  anyways  present 
for  synchronization  and  quick  channel  acquisition  in  practical 
systems.  Proceeding  as  in  [5],  the  semi-blind  implementation 
of  our  algorithm  is  outlined  next: 

1.  Obtain  initial  channel  estimates  hj  and  h2  (and  thus 
71)  through  training  (using  e.g.,  [2]);  and  estimate  the 
autocorrelation  matrix  Ryy  as  (cr2s  denotes  symbol  en¬ 
ergy):  R^  =  cr2'H'hH. 

2.  Refine  iteratively  the  autocorrelation  matrix  each  time 
a  new  symbol  block  y(N)  becomes  available  using  a 
rectangular  sliding  window  of  length  W: 

R)?1  =  Rg-,l+i[y(iV)y’'(A-)  (5) 

-  y{N -W)yn(N -W)]. 


3.  Perform  the  subspace  algorithm  based  on  Rj, 


4.  CHANNEL  IDENTIFIABILITY 


The  key  question  here  is  whether  the  solution  of  S3  is  unique. 
With  the  proof  provided  in  [9],  we  present  channel  identifi- 
ability  results  for  two  precoder  choices:  identical  precoders 
and  distinct  precoders. 

Theorem  1  (identical  precoders):  Suppose  al),  a2)  and  a3) 
hold  true;  if  ©1  =  02  =  0,  the  matrix  Q  in  (4)  loses  row 
rank  by  two  and  the  resulting  estimate  [h{,h^]/  belongs 
to  a  two-dimensional  vector  space  that  is  spanned  by  hi 2  = 
[hf ,  h?]T  andh2\  =  [hf .  -hj4]7  .  The  underlying  channels 
are  identified  up  to  two  scalar  ambiguities  as: 


hi  _  1  q{I  — a2I  h;) 

h2j  -  |qi|2  +  |o2|2  M1  Qi1  J  [h'i. 


(6) 
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Theorem  2  (distinct  precoders):  Suppose  al),  a2)  and 
a3)  hold  true;  let  D  denote  any  diagonal  matrix  with  unit 
amplitude  diagonal  entries,  and  ©i,  @2  be  formed  by  any 
J  —  L  rows  of®  1,  ©2,  respectively.  If®  1  and  ©2  satisfy: 
D©i  K(®2),  the  resulting  estimate  [h|\  h^]T  is  unique 
up  to  a  constant  and  thus  channel  identifiability  within  one 
scalar  is  guaranteed: 


Therefore,  from  the  received  data  only,  multiple  channels  can 
be  estimated  simultaneously  up  to  one  or  two  scalar  ambigu¬ 
ities  with  linearly  block  precoded  ST-OFDM  transmissions. 
To  enable  channel  equalization,  we  show  next  how  to  resolve 
these  scalar  ambiguities  by  inserting  known  symbols  in  the 
transmitted  sequence. 


Suppose  that  two  known  symbols  pi  and  p>  are  placed  inside 
two  consecutive  blocks  s(2m)  and  s(2 m  +  1)  at  position  k. 
Letting  £4  :=  [s(2m)]fc  and  s2  :=  [s(2m  +  1)]*;  we  obtain: 


si  _  1  T  a\  ajj  [Pi] 

S2J  |ai|2  +  |a2|2  \_~ct2  QiJ  [p2j  ’ 


(ID 


from  which  a\  and  a2  can  be  solved  as: 


1 

K 

h 

Pi 

l^il2  +  N2 

—s2 

h 

P*2. 

(12) 


With  a.\  and  a 2  resolved,  the  true  channels  can  then  be  found 
from  (6).  However,  this  step  is  not  necessary  since  the  symbol 
estimates  can  be  obtained  directly  from  (10)  by: 


s(2n)  1  _  TqiIj  —£*2 Ijl  [  s(2n)  1 
s(2n  +  1)J  |a2Ij  a\Ij  J  |s(2n  +  1)J 


5.  RESOLVING  SCALAR  AMBIGUITIES 


To  resolve  the  scalar  ambiguities  inherent  to  all  blind  chan¬ 
nel  estimators,  known  symbols  are  needed  in  the  transmitted 
sequence.  We  here  focus  on  pre-precoding  pilots  where  the 
known  symbols  are  inserted  before  precoding  by  ©;  alterna¬ 
tively,  known  symbols  can  be  inserted  after  precoding  and  we 
will  call  them  post-precoding  pilots  [9], 

We  pursue  the  scalar  determination  with  identical  pre¬ 
coders  first.  Notice  that  the  estimated  channels  of  (6)  satisfy: 


Vi 

t>  2  ' 

ail  J 

-any 

v; 

-vi 

Q2I  j 

allj 

(8) 


and  thus  "£>34  :=  'D*'D:i  +  V^V4  equals: 

^34  =  {\a1\~  +  \a2\2)(T>\'Dl+Vl'D2)  =  (|ai \2  +  \a2\2)Vi2. 
Multiplying  y(n)  by  T>?,L4  yields  [c.f.  (3)]: 

z(2n  +  l)  (n)  <9> 

_  1  [  a\\ j  a*2lj\  [  _2?34©s(2n) 

|qi|2  +  |a2|2  [-“2I J  QiIjJ  [x>34©s(2n  +  1)J  ’ 

where,  for  brevity,  we  omitted  the  noise. 

Because  the  known  symbols  are  inserted  in  the  data 
stream  before  precoding,  we  need  to  equalize  the  channel 
and  compensate  for  the  precoding  first,  before  resolving  the 
residual  scalar  ambiguities.  With  identical  precoders,  a  zero¬ 
forcing  (ZF)  equalizer  can  be  applied  to  z(2 n)  and  z(2n  +  1) 
in  (9)  by  pre-multiplying  with  (2?340)t,  where  t  stands  for 
matrix  pseudo-inverse.  Based  on  (9),  the  equalizer  outputs 
s(2n)  :=  (X>34©)tz(2n)  ands(2n  +  l)  :=  (£>340)t z(2n  + 
1)  can  be  written  as: 

s(2  n)  _  1  all  j  a*2\j  s(2  n) 

s(2n  +  l)  |ai|2  +  |a2|2  — q2Ij  ot\lj  s(2n+l) 

(10) 


Wth  distinct  precoders,  we  should  equalize  z(2n)  by 
(£>3401)1  andz(2n+l)  by  (P34@2)l.  Substituting  a  =  a\ 
and  a2  =  0  in  (12),  the  scalar  a  can  be  figured  out  as: 

Q  =  (slpi  +S2Ps!)/(|si|2  +  N2)-  (14) 

Similarly,  if  \px  \  =  |p2|  and  thus  | s  1 1  =  |s2|,  eq.  (14)  can  be 
further  simplified  to  a  =  (1/2)  (pi/si  +p2/s2). 

Remark  (advantage  of  distinct  over  identical  precoders):  As 
indicated  by  Theorems  1  and  2,  with  distinct  precoders  ©1 
and  ©2  the  channels  can  be  identified  up  to  one  scalar  a  in¬ 
stead  of  two  scalars  (ai,a2)  that  must  be  determined  with 
identical  precoders  @  l  =  02  =  0.  With  one  pair  of  known 
symbols  (pi,p-2),  the  residual  scalar  ambiguities  can  be  re¬ 
solved  by  (12)  and  (14)  for  pre-precoding  pilots.  Therefore, 
the  advantage  of  distinct  precoders  over  identical  precoders  is 
not  clearly  justified  since  two  scalars  are  also  not  difficult  to 
resolve  for  identical  precoders  as  in  (12).  However,  the  noise 
analysis  we  detail  in  [9]  for  the  scalar  ambiguity  determina¬ 
tion  of  (12)  and  (14)  reveals  that  distinct  precoders  lead  to  a 
3dB  gain  over  identical  precoders  for  suppressing  the  chan¬ 
nel  estimation  error  caused  by  the  imperfectly  resolved  scalar 
ambiguities.  To  achieve  the  same  channel  estimation  accu¬ 
racy,  identical  precoders  need  to  employ  twice  the  number  of 
pilots  relative  to  distinct  precoders. 

In  a  nutshell,  designing  distinct  precoders  instead  of  iden¬ 
tical  precoders  pays  off  either  in  terms  of  increasing  the  sys¬ 
tem  efficiency  by  using  half  the  number  of  pilots,  or,  in  terms 
of  improving  the  system  performance  with  the  same  number 
of  pilots,  a  feature  that  we  also  verified  by  simulations. 

6.  SIMULATIONS 

To  test  the  proposed  channel  estimation  algorithm  we  use  as 
figures  of  merit  the  averaged  Normalized  Mean  Square  Er¬ 
ror  (NMSE)  of  the  channels  defined  as:  (1/2)  ll&»  ~ 
hj||2/||hj||2,  and  the  Bit  Error  Rate  (BER).  We  set  the  sys¬ 
tem  parameters  as:  L  =  8,  K  =  3 L,  J  =  I\  +  L  =  32; 
and  generate  the  channels  according  to  the  Channel  Model 
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A  specified  by  ETSI.  We  assume  here  that  each  data  burst 
has  N  =  400  symbol  blocks,  in  which  the  first  one  s(0)  is 
not  precoded  and  serves  as  a  training  block.  The  semi-blind 
channel  estimator  is  implemented  using  (5)  with  II'  =  100 
and  is  initialized  using  the  training  based  method  of  [2],  The 
channel  estimates  are  updated  every  50  blocks  in  order  to  ren¬ 
der  the  complexity  reasonable.  To  resolve  the  residual  scalar 
ambiguities,  Np  =  1,2,4  pairs  of  pre-precoding  pilots  are 
distributed  inside  each  set  of  50  symbol  blocks. 

To  illustrate  the  advantage  of  distinct  over  identical  pre¬ 
coders,  we  depict  in  Fig.  2  the  NMSE  averaged  over  the  entire 
data  burst  with  different  number  of  pilots  employed.  From 
Fig.  2,  we  infer  that  indeed  at  high  SNR  identical  precoders 
need  to  double  the  number  of  pilots  to  be  able  to  catch  up 
with  the  performance  of  distinct  precoders,  which  is  consis¬ 
tent  with  our  noise  analysis  in  [9].  To  check  the  overall  per¬ 
formance  of  channel  estimation,  equalization  and  ST  decod¬ 
ing,  we  plot  in  Fig.  3  the  BERs  averaged  over  the  entire  data 
burst  with  ZF  equalizers  constructed  from  different  channel 
estimates.  Compared  to  the  benchmark  BER  performance 
obtained  with  perfect  channel  knowledge  at  the  receiver,  our 
semi-blind  channel  estimator  only  incurs  less  than  2  dB  SNR 
loss,  while  a  high  error  floor  is  observed  for  the  training  based 
approach  since  the  channels  are  time  varying  and  no  tracking 
mechanism  has  been  invoked. 

To  illustrate  the  advantage  of  channel  acquisition  and  co¬ 
herent  detection  at  the  receiver,  we  also  plot  in  Fig.  3  the 
BER  performance  of  a  competing  differential  ST-OFDM  al¬ 
ternative,  where  the  differential  encoding  of  [7]  is  applied  on 
each  subcarrier  to  dispense  with  channel  estimation.  To  make 
up  for  the  same  information  rate,  convolutional  coding  with 
rate  3/4  (=  K/.J)  is  also  tested  for  differential  ST-OFDM. 
Since  the  differential  decoder  output  takes  binary  values  [7], 
the  Viterbi  decoding  algorithm  with  hard  decision  is  applied 
here.  Without  assuming  any  side  channel  information,  the 
path  metric  for  Viterbi  decoding  is  set  to  be  the  Hamming  dis¬ 
tance  between  the  received  bit  stream  at  the  output  of  differ¬ 
ential  decoder  (denoted  by  C\ , . . . ,  c„,  where  c,  e  [0, 1])  and 
the  possible  codeword  candidates  (denoted  by  Ci , . . . ,  c„ ).  If 
side  information  on  the  channel  fading  coefficients  {//}"=1, 
where  ff  :=  \H\  (/o,) |2  + 1//2 (/?,) can  be  acquired  at  the  re¬ 
ceiver,  the  path  metric  could  be  modified  using  the  weighted 
Hamming  distance:  Yh= i  fii&i  _  G')2-  Fig-  3  demonstrates 
that  precoded  ST-OFDM  equipped  with  our  semi-blind  chan¬ 
nel  estimator  outperforms  the  differential  ST-OFDM  consid¬ 
erably,  for  both  uncoded  and  coded  transmissions  at  the  con¬ 
sidered  SNR  range. 
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ABSTRACT 

Probability  of  bit  error  expressions  are  derived  for  direct  se¬ 
quence  CDMA  (DS-CDMA)  and  multicarrier  CDMA  (MC- 
CDMA)  with  imperfect  diversity  combining.  Pilot  and  data 
channels  are  transmitted  through  a  Rayleigh  fading  chan¬ 
nel  with  an  exponential  multipath  intensity  profile.  Chan¬ 
nel  statistics  are  estimated  using  simple  integrators.  Then 
the  multipath  in  the  DS  system  and  the  multiple  subcarri¬ 
ers  in  the  MC  system  are  weighted  by  the  imperfect  chan¬ 
nel  estimates  and  combined.  Keeping  the  data  rate,  the 
transmit  power,  and  the  fading  power  constant,  as  the  band¬ 
width  increases,  the  number  of  multipaths  increases  in  the 
DS  system,  and  the  number  of  subcarriers  increases  in  the 
MC  system.  At  the  same  time,  the  signal  strength  in  each 
path/subcarrier  decreases,  and  results  in  larger  errors  in  the 
channel  estimates.  We  show  that  there  is  a  tradeoff  between 
diversity  order  and  SNR  available  for  channel  estimation  in 
both  DS-CDMA  and  MC-CDMA.  Moreover,  we  also  show 
that  MC-CDMA  performs  better  than  DS-CDMA. 

1.  INTRODUCTION 

Future  personal  communication  systems  are  proposed  to  be 
wide-band  to  support  high  rate  applications,  such  as  video 
and  data.  High  bandwidth  brings  the  possibility  of  diversity 
gain.  Among  the  various  diversity  combining  techniques,  it 
has  been  shown  that  maximal  ratio  combining  (MRC)  max¬ 
imizes  output  signal-to-noise  ratio  (SNR).  However,  ideal 
MRC  requires  perfect  knowledge  of  the  channel  fading  statis¬ 
tics  of  each  diversity  branch.  Of  course,  channel  estimates 
will  not  be  perfect  when  the  received  signal  is  corrupted  by 
fading  and  noise.  This  error  in  the  channel  estimates  will 
degrade  performance.  In  a  diversity  system  with  a  fixed  to¬ 
tal  energy,  as  the  number  of  diversity  branches  increases,  the 
energy-per-branch  decreases,  and  the  weaker  signals  result 
in  larger  errors  in  the  channel  estimates.  On  one  hand,  the 
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diversity  gain  improves  with  more  diversity  branches,  on  the 
other  hand,  greater  degradation  is  caused  by  the  errors  in  the 
channel  estimates.  Thus,  there  is  a  point  when  extra  order 
of  diversity  actually  degrades  overall  receiver  performance, 
due  to  the  decreasing  SNR  available  for  estimation. 

In  this  paper,  we  examine  the  tradeoff  between  diver¬ 
sity  order  and  channel  estimation  errors  in  direct  sequence 
CDMA  (DS-CDMA)  1 1  — 4]  and  multicarrier  CDMA  (MC- 
CDMA)  [5-9].  In  a  wideband  DS-CDMA  system  with  an 
exponential  multipath  intensity  profile  (MIP),  each  resolved 
multipath  has  a  different  average  power.  Thus,  there  will  be 
less  accurate  estimates  for  the  weaker  paths.  In  this  same 
environment,  if  we  trade  path  diversity  for  frequency  diver¬ 
sity,  we  can  design  a  MC-CDMA  system  with  L  subcarriers, 
where  L  is  the  number  of  resolvable  paths  in  the  direct  se¬ 
quence  system.  The  processing  gain  in  each  subcarrier  is 
L  times  smaller  than  the  processing  gain  in  the  direct  se¬ 
quence  system.  The  CDMA  signal  with  lower  processing 
gain  is  repeated  L  times,  once  in  each  of  the  L  subcarriers. 
Each  subcarrier  now  experiences  flat  Rayleigh  fading,  and 
the  average  fade  power  is  the  same  for  each  of  the  subcar¬ 
riers.  We  show  that  MC-CDMA  performs  better  than  DS- 
CDMA,  with  the  extent  of  the  improvement  depending  on 
the  rate  of  decay  of  the  exponential  delay  profile  [10]. 

2.  DIRECT  SEQUENCE  CDMA 

2.1.  Signal  and  Channel  Model 

The  direct  sequence  BPSK  signal  transmitted  by  the  kth 
user  is 

Skit)  =  K  |  ^2  Uk[n]h{t  -  nTc  -  Tk)ej(-Uct~'pk)  1 , 

where  =  ACPk[n]+BCdk[n]dk[n\,  h(t)  istheNyquist 
chip-shaping  filter,  rk  is  the  relative  time  delay  of  user  k,  wc 
is  the  carrier  frequency,  and  tpk  is  the  carrier  phase.  Also, 
CPk  [n]  and  Cm  [n]  are  binary  orthogonal  spreading  sequences 
for  the  pilot  channel  and  the  data  channel,  respectively,  A 
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e  ^rt  Cpoj”  —  /] 

Fig.  1.  The  Direct  Sequence  Complex  Channel  Estimator  Block  Diagram. 


and  B  are  their  corresponding  transmit  amplitudes  and  rf*  [«] 
is  binary  data  with  bit  interval  T.  The  spreading  sequences 
chip  rate  is  1  /Tc,  where  Tc  =  T/N,  and  N  is  the  processing 
gain.  Assuming  perfect  average  power  control,  the  received 
signal  is  r(t)  =  where 

K  —  1L  —  1  oc 

f(f)  =  X  X  X  ock,iei>>k  luk[n]h{t  -  (?)  +  l)Tc  -  r*) 

k= 0  1=0  *i=-oc 

+  nu-{t), 

K  is  the  total  number  of  users,  L  is  the  number  of  resolvable 
paths,  otkj  (f)  and  9k.i  (/)  are  the  amplitude  and  phase  of  the 
klh  user’s  Ith  path,  respectively,  and  ii  „  (f )  is  complex  white 
Gaussian  noise,  with  two-sided  spectral  density  r/o-  The  fad¬ 
ing  amplitudes  {qa  ,/}  are  independent  Rayleigh  processes. 
Each  user  has  an  i.i.d.  exponential  multipath  intensity  pro¬ 
file  with  normalized  decay  factor  5,i.e.,E[a2k ,]  =  SlQe~6,IL. 
The  phases  {#a  ,/}  are  i.i.d.  uniform  random  processes  in 
[0,  2tt];  the  delays  {rk }  are  independent  uniform  random 
variables  in  [0,TC].  We  assume  the  channel  to  be  slowly 
varying  such  that  the  channel  fading  parameters  can  be  as¬ 
sumed  constant  during  the  estimation  interval,  Ty.  The  de¬ 
lay  spread  is  assumed  to  be  sufficiently  less  than  a  bit  inter¬ 
val  T  such  that  there  is  no  significant  intersymbol  interfer¬ 
ence. 


2.2.  Receiver  Model 


is  given  by 


))  =  BNao.ie^0-1  do[(v  —  l)A’y]  +  Ii  +  M(u  +  A,//. 

Self  interference,  multiple  access  interference,  and  noise 
terms  are  defined  in  [10]. 


2.3.  Probability  of  Bit  Error 

For  well-chosen  spreading  sequences  with  large  processing 
gain,  we  can  approximate  the  set  {U  ,*  V/};-o l- i  as  be¬ 
ing  independent.  With  the  help  of  [3,  Appendix  4,  pp.345], 
the  probability  of  bit  error  is  given  by 


P,i  r+Jt  i  n 

e  2 nj  J_x+jt  v  11 


VllV-21 


{v+jVu){v  ~jv-2l) 


d?7 


L—  1 


-  TT  l’l iV-2i  X  lhn 

V,=o 


V  ~  JV-21 


L-i 

n 


v  (v+jvid)(l>  -  jV2,l) 

<1  =  0 


0) 

(2) 

(3) 


assuming  the  decay  factor  <5  of  the  exponential  MIP  is  non¬ 
zero,  and  that  the  integrand  in  (1)  has  distinct  roots.  The 
variables  vu  and  V21  in  (3)  are  defined  in  [3,  equation  4B.6], 
where  they  involve  the  following  second  moments  of  W\ 
and  ly  [10], 


The  receiver  estimates  the  fading  parameters  on  each  path. 
The  estimator  block  diagram  is  shown  in  Figure  1 .  Assume 
chip  timing,  bit  timing,  and  local  code  generator  synchro¬ 
nization  have  been  established.  The  channel  estimate  for 
the  Ith  path  of  user  0  consists  of  a  self-interference  term,  a 
multiple  access  interference  term,  and  a  noise  term: 

Wi  =  AN,aojej0tJJ  +  S,  +  Me,  +  Nel. 

Their  definitions  can  be  found  in  [10], 

Each  estimate  is  updated  at  the  end  of  the  estimation 
interval  Ty .  Then  the  estimates  are  used  to  form  the  decision 
statistic  Z  =  R{  1'},  where  Y  =  X=7>'  "  7  3  /  >  and  the 
complex  conjugate  is  denoted  by  *.  Demodulation  is  similar 
to  the  channel  estimation  operation,  except  despreading  is 
done  with  the  data  spreading.  The  llh  demodulator  output 


/'tv,  tv, 


/'v,  v, 


/'iv,  v, 


^1-NjAin0e-^+T^- 
+  ^y(A'-l)(.4J  +  B5)fl„^, 

«  ~ABNiNE[ah\  =  ^ABN,Nn0er6l'L 


In  the  special  case  that  the  estimation  interval  is  one  bit 
long  ( i.e.,N y  =  N),  and  the  pilot  channel  has  the  same 
power  as  the  data  channel  (i.e.,  A  =  B),  the  integrand  in  (1) 
has  an  Llh  order  repeated  root  at  v  =  jv-2 ,  where  v?  =  v->i , 
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Eb/N0  =  3, 4, 5.2,  7dB.  K  =  10,  Nt  =  2 N,  N  =  64 L 


bW  Expansion  /  Number  of  Paths  /  Number  of  Subcarriers  (L) 


Fig.  2.  Probability  of  error  versus  bandwidth  for  varying 

Eb/r) o- 


Bw  Expansion  /  Number  of  Paths  /  Number  of  Subcarriers  (L) 


Fig.  3.  Probability  of  error  versus  bandwidth  for  different 
estimation  intervals. 


VZ.  Then  the  probability  of  bit  error  is  given  by 


dL~l  fl  Yj  1 

dvL~l  v  (v  +  jvid) 


3.  MULTICARRIER  CDMA 

In  a  multicarrier  CDMA  system,  the  same  user  signal  as  in 
the  direct  sequence  section,  with  the  same  data  rate  but  pro¬ 
portionally  smaller  processing  gain  and  therefore  with  lower 
chip  rate,  is  transmitted  in  L  non-overlapping  sub-bands. 
The  bandwidth  of  the  subcarriers  is  selected  to  be  the  co¬ 
herence  bandwidth  of  the  channel  such  that  each  subcarrier 
experiences  independent,  slowly  varying,  flat  fading.  The 
signal  model  and  receiver  structure  are  similar  to  the  previ¬ 
ous  direct  sequence  section  and  are  described  in  [11].  The 
probability  of  bit  error  expression  is  also  found  in  [1 1]. 


4.  NUMERICAL  RESULTS 

We  compare  the  performance  of  DS-CDM  A  and  MC-CDM  A 
with  increasing  bandwidth.  We  fix  the  data  rate  of  both  sys¬ 
tems  and,  as  the  bandwidth  increases,  the  processing  gain 
( N )  of  the  DS  system  increases,  and  the  MC  signal  is  re¬ 
peated  in  more  subcarriers.  When  L  (number  of  paths/  num¬ 
ber  of  subcarriers)  equals  unity,  the  two  systems  are  identi¬ 
cal,  with  the  same  processing  gain  (64  in  our  results).  In  the 
numerical  results  to  follow,  we  use  equal  power  in  the  pilot 
and  the  data  channel,  i.e.,  A  =  B. 


To  make  the  comparison  between  different  bandwidths, 
we  keep  the  total  transmit  power  constant  and  the  total  fad¬ 
ing  power  constant,  that  is,  decreasing  the  transmit  power 
per  subcarrier  as  the  number  of  subcarriers  increases,  and 
renormalizing  the  power  of  each  path  as  the  number  of  paths 
increases.  Traditionally,  assuming  perfect  channel  estima¬ 
tion,  probability  of  error  improves  monotonically  with  the 
bandwidth  [6].  However,  when  there  is  estimation  error,  the 
situation  is  different. 

The  probability  of  error  is  plotted  against  bandwidth 
in  Figure  2  with  10  total  users,  the  estimation  interval  Ni 
equaling  2  bits,  and  £6/r/0  of  3,  4,  5.2,  and  7  dB.  The  nor¬ 
malized  decay  factor  is  five  for  the  DS-CDMA  system.  We 
have  used  a  processing  gain  of  64  for  each  subcarrier  of 
the  MC  system,  and  a  processing  gain  of  64L  for  the  DS 
system.  As  the  bandwidth  increases,  the  bit  error  rate  first 
improves  and  then  degrades.  The  increasing  L  helps  perfor¬ 
mance  by  introducing  diversity  gain.  At  the  same  time,  as 
L  goes  up,  the  SNR  available  for  each  estimate  goes  down; 
this  causes  more  estimation  error,  and  in  turn,  results  in  per¬ 
formance  degradation.  Thus,  an  optimal  value  of  L  exists. 
When  we  increase  the  Eb/ri0,  the  optimal  L  becomes  larger, 
because  the  higher  Eb/ r/o  reduces  the  degradation  due  to  the 
estimation  error. 

The  MC  system  performs  better  than  the  DS  system.  AH 
the  subcarriers  in  the  MC  system  have  equal  SNR.  In  the  DS 
system,  the  multipaths  follow  an  exponential  profile.  Some 
of  the  paths  have  higher  SNR  than  the  MC  system,  and  oth¬ 
ers  have  lower  SNR.  However,  the  paths  with  lower  SNR  are 
hurting  performance  more  than  the  paths  with  higher  SNR 
are  helping.  Therefore,  the  overall  performance  of  the  DS 
system  is  worse  than  the  MC  system. 

In  Figure  3,  we  have  plotted  the  probability  of  error  for 
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Fig.  4.  Probability  of  error  versus  bandwidth  for  different 
normalized  decay  factors. 

estimation  intervals  of  2,  3,  and  4  bits.  The  increasing  es¬ 
timation  intervals  reduce  the  channel  estimation  error;  this 
allows  the  optimal  L  to  be  higher. 

In  Figure  4,  we  have  plotted  the  probability  of  error  for 
different  normalized  decay  factors  of  the  DS  system.  For  a 
small  decay  factor,  the  DS  system  performs  similar  to  the 
MC  system,  but  as  the  decay  factor  becomes  larger,  the  DS 
system  becomes  much  worst  than  the  MC  system. 

We  have  approximated  the  performance  of  our  DS-CDM  A 
system  by  a  system  with  uncorrelated  estimates.  This  ap¬ 
proximation  is  valid  for  large  processing  gain.  We  have  also 
neglected  self-interference  in  our  DS-CDMA  system,  as¬ 
suming  a  large  number  of  users.  We  examine  the  effects  of 
these  approximations  in  Figure  5.  For  a  moderate  number  of 
users  (K  =  10)  and  moderate  processing  gain  (A'  =  16L), 
the  simulation  results  match  our  theoretical  expression  very 
well.  For  a  very  small  number  of  users  (K  —  2)  and  mod¬ 
erate  processing  gain  (A’  =  16L),  the  simulation  results 
still  match  quite  well  with  the  theoretical  expression.  When 
the  number  of  users  and  the  processing  gain  are  both  very 
small  (K  —  2,  N  =  4L),  the  effects  of  both  approxima¬ 
tions  become  apparent,  and  there  are  discrepancies  between 
simulation  and  theoretical  results. 
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ABSTRACT 

Linear  Precoding  consists  in  multiplying  by  a  N  x  K  matrix  a  K- 
dimensional  vector  obtained  by  serial  to  parallel  conversion  of  a 
symbol  sequence  to  be  transmitted.  In  this  paper,  the  performance 
of  MMSE  receivers  for  certain  large  random  isometric  precoded 
OFDM  systems  on  fading  channels  is  analyzed.  Using  new  tools, 
borrowed  from  Free  Probability  Theory ,  it  can  be  shown  that  the 
Signal  to  Interference  plus  Noise  Ratio  at  the  equalizer  output  con¬ 
verges  almost  surely  to  a  deterministic  value  depending  on  the 
probability  distribution  of  the  channel  coefficients  when  A  —y  +<*> 
and  K/N  -»  a  <  1.  These  asymptotic  results  are  used  to  answer 
the  trade-off  Convolutional  Coding  versus  Linear  Precoding  issue 
while  preserving  a  simple  MMSE  equalization  scheme  at  the  re¬ 
ceiver. 

1.  INTRODUCTION. 

A  multi-carrier  OFDM  system  [1]  using  a  Cyclic  Prefix  for  pre¬ 
venting  inter-block  interference  is  known  to  be  equivalent  to  mul¬ 
tiple  flat  fading  parallel  transmission  channels  in  the  frequency 
domain.  In  such  a  system,  the  information  sent  on  some  carriers 
might  be  subject  to  strong  attenuations  and  could  be  unrecoverable 
at  the  receiver.  This  has  motivated  the  proposal  of  more  robust 
transmission  schemes  combining  the  advantages  of  CDMA  with 
the  strength  of  OFDM  known  as  OFDM-CDMA  [2],  in  which  the 
information  is  precoded  across  all  the  carriers  by  a  pre-coding  ma¬ 
trix.  This  combination  increases  the  overall  frequency  diversity  of 
the  modulator,  so  that  unreliable  carriers  can  still  be  recovered  by 
taking  advantage  of  the  subbands  enjoying  a  high  Signal  to  Noise 
Ratio  (SNR).  Although  originally  proposed  for  a  multiuser  access 
scheme,  this  concept  is  extended  to  all  single  user  OFDM  sys¬ 
tems  and  is  referred  in  the  sequel  as  Linear  Precoded  OFDM  (LP 
OFDM)[3],  The  LP  OFDM  equivalent  frequency  model  scheme 
is  depicted  in  figure  1,  in  which  the  input  symbol  stream  is  se¬ 
rial  to  parallel  converted,  then  the  resulting  A'-dimensional  symbol 
vector  s(n)  (a  white  vector  process  with  E  (s(n)sH(n))  =  Ik)  is 
multiplied  by  an  isometric  A  x  K  matrix  Wy  (i.e.  Wy  Wy  -  I#) 
where  N>K.  This  A-dimensional  vector  Wys(n)  is  parallel  to  se¬ 
rial  converted,  and  the  corresponding  generated  data  stream  is  sent 
across  a  frequency  non  selective  Rayleigh  fading  channel.  After 
serial  to  parallel  conversion,  the  A-dimensional  received  vector 
y(rc)  can  be  written  as: 

y(n)  =  Hy(/i)Wys(n)  +  n(?z)  (1) 

where  n(n)  is  a  white  additive  Gaussian  noise  such 
that  E  (n(n)n(n)H)  —  (Tv,  and  where  H y(n)  = 


diag([Ai(n),...  ,%(;j)])  is  the  Ax  A  diagonal  complex  ma¬ 
trix  bearing  on  its  diagonal  the  channel  gains. 


s(0  v(0  f(f) 


Fig.  1.  System  Model 


Note  that  Giraud  and  Belfiore  [4],  and  then  Boutros  and  Viterbo 
[5]  already  introduced  such  a  scheme  called  signal  space  diversity. 
An  important  problem  lies  in  the  choice  of  the  amount  of  redun¬ 
dancy  introduced  by  Linear  Precoding,  i.e.  the  ratio  K/N,  and  also 
in  the  choice  of  matrix  Wy.  [4]  and  [5]  considered  the  case  where 
K/N  =  1,  i.e.  Wy  is  unitary.  They  assumed  the  entries  of  Hy(n) 
independent  and  identically  distributed,  and  proposed  to  derive  an 
upper  bound  of  the  error  probability  for  the  Maximum  Likelihood 
(ML)  detector  of  s(/i).  They  discovered  that,  at  least  for  high  sig¬ 
nal  to  noise  ratios,  Wy  has  to  be  chosen  in  such  a  way  that  the 
minimum  L-distance  product  of  the  constellation  {WyS,-,!  £  /}, 
where  {s ,-,t  6  1}  denotes  the  set  of  possible  values  taken  by  vec¬ 
tor  s(n),  be  maximum.  More  recently,  Wang  and  Giannakis  ([6], 
see  also  [3])  generalized  these  results  to  the  case  K  <N  when  the 
covariance  matrix  of  random  vector  h(n)  =  (hi  (n), . . . ,  %(n))r  is 
rank  deficient. 

The  high  computational  cost  of  the  ML  detector  prevents  its  use  in 
practical  contexts.  Actually,  due  to  its  lower  complexity,  MMSE 
detection  is  often  preferred.  Therefore,  in  this  paper,  we  study  the 
impact  of  the  choice  of  Wy  and  of  parameter  K jN  on  the  asymp¬ 
totic  performance  of  the  MMSE  receiver  when  A  and  K  converge 
toward  +°°  in  such  a  way  that  A/A  — >  a  <  1.  Several  papers  [7] [8] 
have  recently  analyzed  the  behaviour  of  the  SINR  at  the  output  of 
the  MMSE  detector  when  the  entries  of  Wy  are  independent  and 
identically  distributed  random  variables  (to  be  referred  to  in  the 
sequel  as  the  i.i.d.  case)  and  in  the  case  where  Hy  (n)  is  reduced  to 
Iy.  The  originality  of  our  contribution  lies  in  the  fact  that  the  lin¬ 
ear  random  precoder  Wy  is  isometric  instead  of  being  i.i.d.  Such 
a  choice  is  justified  by  the  fact  that  isometric  precoders  provide 
much  better  results  than  i.i.d.  ones  as  will  be  seen  below. 

From  a  technical  stand  point,  the  i.i.d.  case  study  of  [8]  is  based 
on  mathematical  results  that  concern  the  limiting  distribution  of 
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eigenvalues  of  some  large  random  matrices  with  independent  and 
identically  distributed  entries  (see  e.g.  [9]).  The  results  given  here 
rely  on  the  so-called  Free  Probability  Theory  initially  developed 
by  D.  Voiculescu.  Due  to  the  lack  of  space,  the  corresponding 
tools  and  derivations  are  not  introduced  here.  The  interested  reader 
may  consult  [10]  (this  paper  can  be  downloaded  at  http://www- 
syscom.univ-mlv.fr/~loubaton/index.html).  Note  that  Evans  and 
Tse  already  introduced  free  probability  theory  in  [7],  but  for  solv¬ 
ing  quite  different  problems. 

2.  ASYMPTOTICAL  SINR  PERFORMANCE 

In  this  section,  the  SINR  of  Linear  Precoded  OFDM  systems  de¬ 
signed  with  random  matrices  is  derived.  Since  the  time  index  n 
is  irrelevant  in  the  following,  we  simply  discard  it  from  now  on. 
We  assume  that  H,v  =  diag(  [/?  ( , ...,%])  has  identically  distributed 
centered  random  diagonal  entries.  |/z,  |2  is  supposed  to  have  a  prob¬ 
ability  density  p(t)  with  finite  moments  of  all  orders.  We  set 
E(\hj\2)  —  1,  so  that  X  represents  the  inverse  of  the  SNR  at  the 
receiver  input.  Notice  that  random  variables  {/»/}, =i,/v  are  not  as¬ 
sumed  to  be  independent.  However,  we  assume  that  for  each  /  >  1, 

I  N 

lint  —  ^  |/ia.-|2/  =  fTO/j/pO  almost  surely.  (2) 

"-*4~N£ \ 

which  implies  some  kind  of  asymptotic  independence  between  the 
random  variables  hj  and  hj  if  |/  —  j\  — >  <».  This  hypothesis  is  quite 
realistic  in  the  context  of  LP  OFDM  schemes  if  large  time  inter¬ 
leavers/deinterleavers  are  inserted  in  the  scheme  represented  in  fig¬ 
ure  1.  We  stress  on  the  fact  that  the  ergodicity  relation  expressed 
by  (2)  influences  deeply  our  results. 

We  now  explain  how  the  random  matrix  WA  is  generated. 
A  random  unitary  matrix  is  said  Haar  distributed  if  its  proba¬ 
bility  distribution  is  invariant  by  left  multiplication  by  constant 
unitary  matrices  (this  invariance  condition  specifies  the  distribu¬ 
tion).  Such  a  matrix  can  be  generated  by  the  following  way:  let 
X  =  [xjj]  i  <jj<N  be  a  NxN  random  matrix  with  independent  com¬ 
plex  Gaussian  centered  unit  variance  entries.  Then  the  unitary  ma¬ 
trix  X(XWX)-1/2  is  Haar  distributed  (see  [10]).  W,v  is  generated 
by  extracting  any  K  columns  front  a  NxN  Haar  distributed  unitary 
matrix  independent  of  H,y. 

Before  going  further,  let  recall  the  expression  of  the  SINR  at 
one  of  the  K  outputs  of  the  MMSE  detector.  The  SINR  pWv  is 
easily  shown  to  express  as  pwv  =  j_"'’  -  where: 

i]w,v  =  w"h"  (HvWvW^H"  +  M /v)  ” '  H.vwy  . 

We  are  now  in  position  to  state  the  main  results  of  this  contribution: 

Theorem  1  (Isometric  Case)  Assume  that  matrices  W,v  and  H\ 
are  chosen  as  above  and  moreover,  that  the  probability  density  p(t) 
of  the  random  variables  has  a  compact  support  included 

in  the  interval  [0,  c]  ( which  implies  that  supigN  \hi  |"  <  c  <  °o  almost 
surely ). 

When  N  grows  towards  infinity  and  K/N  —>  a  <  1,  the  SINR  p„  v 
at  the  output  of  a  MMSE  equalizer  converges  almost  surely  to  a 
value  p  that  is  the  unique  solution  of  the  equation 

r  — - — = —  P(t)  dt = =J- .  o) 

Jo  <xt  +  X(l  —  <x)p  +  A.  p-H 


The  first  important  conclusion  provided  by  this  result  is  that  the 
SINR  pWv  converges  to  a  deterministic  value  depending  only  on 
the  channel  coefficients  distribution,  but  not  on  the  particular  chan¬ 
nel  realization.  Relation  (2)  plays  a  fundamental  role  in  this  re¬ 
spect.  In  some  sense,  the  precoded  system  equipped  with  a  MMSE 
receiver  allows  to  transform  a  flat  fading  Rayleigh  channel  into  a 
Gaussian  channel  with  signal  to  noise  ratio  p.  This  confirm  the 
results  of  [5]  stated  in  the  case  of  the  maximum  likelihood  detec¬ 
tor.  and  the  observations  made  in  [11]  and  [12]  in  the  context  of 
MC-CDMA  systems. 

The  second  conclusion  is  that  the  SINR  limit  does  not  depend 
on  the  particular  realization  of  as  well.  This  suggests  that 
for  large  blocks,  it  is  irrelevant  to  optimize  the  performance  of 
a  MMSE  receiver  with  respect  to  Wy.  In  the  statement  of  the¬ 
orem  1,  p(t)  is  assumed  to  be  compactly  supported.  This  tech¬ 
nical  hypothesis  is  needed  because  the  most  powerful  results  of 
free  probability  theory  applied  to  random  matrices  require  com¬ 
pactly  supported  measures.  Although  the  usual  channel  probabil¬ 
ity  distributions  like  the  Rayleigh  or  the  Rice  distributions  are  not 
compactly  supported,  in  practice,  formula  (3)  predicts  quite  well 
the  performance  of  our  precoded  modulation  scheme  using  MMSE 
detection. 

Since  the  proof  of  (3)  is  non  trivial  and  needs  the  use  of  sophis¬ 
ticated  arguments,  only  an  outline  is  provided.  We  have  shown  in 
[10]  that  T|Wv  converges  almost  surely  to  a  value  f[  which  does  not 
depend  on  the  choice  of  w#.  For  a  given  A,  there  are  K  quan¬ 
tities  t)Wv  each  corresponding  to  the  choice  of  a  particular  col¬ 
umn  code  in  W/y.  Their  sum  over  all  the  columns  of  this  ma¬ 
trix  is  trace ((HA-WArW^H^  +  M/v)“lHA-WA-W"H«).  Hence, 
the  limit  value  r[  is  given  by  : 

f\  =  fon  ^trace^HvWyW^Hft  +  XIvr’HvWvWftH#) 

By  applying  free  probability  results,  we  can  show  that  the  em¬ 
pirical  eigenvalue  distribution  of  H/vWyWjyHjy  converges  almost 
surely  to  a  compactly  supported  measure  9,  which  can  be  derived 
explicitly.  Therefore,  t)  converges  almost  surely  to:  j  f  ^  dQ(t) . 

This  also  shows  that  pwv  converges  to  a  deterministic  value  -pL 
denoted  p  and  solution  of  (3). 

For  comparison  sake,  it  is  useful  to  give  the  expression  of  the 
asymptotic  SINR  when  Wy  is  i.i.d.  : 

Theorem  2  (i.i.d.  Case)  Assume  that  the  entries  of  Wy  are  cen¬ 
tered  i.i.d.  random  variables  with  variance  1  /N.  that  the  elements 
\hj]  are  identically  distributed  random  variables  such  that  \hj\~ 
has  a  probability  density  p(t)  with  compact  support,  and  that  for 
each  bounded  continuous  function  tp, 

s^h%^\k-)=mw))=I^)p{t)dt 

almost  surely. 

When  N  grows  towards  infinity  and  KfN  — >  a  <  1,  the  SINR  pwv 
at  the  output  of  a  MMSE  equalizer  converges  almost  surely  to  a 
value  Pi  that  is  the  unique  solution  of  the  equation 

r  . . -  - p{t)  dt  =  .  (4) 

do  ar  +  Xpi+3.  Pi  +  1 

This  result  is  a  direct  consequence  of  corollary  1  in  [7]  and  the 
main  result  of  [9].  The  two  main  conclusions  stated  in  the  isomet¬ 
ric  case  remain  valid  in  the  i.i.d.  case.  However,  we  observe  that 
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for  a  fixed  value  of  a,  (3j  <  p  because  for  each  p  >  0, 

lo  at  +  +  dt  -  f0  otf  +  X(l-a)P+X  dt 

Moreover,  the  asymptotic  performance  of  the  MMSE  receiver 
in  the  isometric  case  is  all  the  more  better  with  jrespect  to  the 

i.i.d.  case  that  a  is  close  to  1.  Conversely,  pj  ~  p  if  a  is  close 
to  0. 

3.  INFLUENCE  OF  CHANNEL  AND  PRECODER 

In  this  section,  we  provide  better  insights  of  the  effects  of  the  fad¬ 
ing  channel  and  of  the  nature  of  the  precoder  (i.i.d.  versus  isomet¬ 
ric)  on  the  SINR  performance. 

Effect  of  the  fading  on  the  performance  in  the  i.i.d.  case.  We 

first  study  the  effect  of  the  fading  in  the  i.i.d.  case,  and  compare 
(4)  with  the  results  of  [8]  obtained  in  the  Gaussian  channel  case. 
For  that  purpose,  rewrite  (4)  as 


Pt=E| 


(5) 


where  Eu,p  denotes  the  mathematical  expectation  with  respect  to 
\h\2  distribution.  If  N  is  large  enough,  the  righthandside  of  (5) 
can  be  approximated  by  the  empirical  mean  ~  Xfjo*  „  1  x  In 

p]TT+i^F 

the  Gaussian  channel  case,  the  distribution  of  \h\2  is  reduced  to 
8(1),  and  equation  (5)  coincides  with  the  result  given  in  [8],  Re¬ 
call  that  Tse  and  Hanly  interpreted  in  [8]  the  factor  as  the 

P|+l 

effective  interference  of  user  /  on  the  desired  user  k  at  the  desired 
target  SINR  pj.  The  term  =4L  =  ^  thus  represents  the  to¬ 
tal  amount  of  multiuser  interference  at  the  output  of  the  MMSE 
receiver  (the  term  is  multiplied  by  K  the  number  of  users, 

Pl  +  * 

while  the  coefficient  ^  is  due  to  the  spreading  gain  provided  by 
the  precoder).  It  is  interesting  to  note  that  in  the  fading  channel 
case,  the  righthandside  of  (5)  coincides  with  an  averaged  version 
(on  the  amplitude  of  the  channel  coefficients)  of  the  inverse  of  the 
sum  of  the  same  interference  term  and  of  the  term  rep¬ 

resents  the  contribution  of  a  thermal  noise  of  variance  mrina 


Gaussian  channel.  The  diversity  provided  by  the  precoder  is  of 
course  due  to  the  averaging  on  the  values  taken  by  \h\2  in  (5). 

Comparison  between  i.i.d.  and  isometric  precoders.  In  order 
to  compare  the  two  kind  of  precoders,  rewrite  (3)  as 


P  =  E 


W 


a  i  X  n  „  3 

\]£r+wi(1“0(i?S)V 


(6) 


It  is  interesting  to  note  that  the  second  term  of  the  righthandside 
of  (5)  and  (6)  are  similar.  The  multiuser  interference  term  ap¬ 
pears  in  both  formulas,  while  the  term  p|r,  representing  the  effect 
of  the  thermal  noise  in  the  i.i.d.  case,  is  multiplied  in  the  isomet¬ 
ric  case  by  1  —  a^=,  which  is  of  course  smaller  than  1.  In  other 

words,  for  a  given  target  SINR  of  P,  an  isometric  precoded  system 
corrupted  by  a  thermal  noise  of  variance  X  provides  the  same  per¬ 
formance  than  an  i.i.d.  precoded  one  corrupted  by  a  thermal  noise 


of  variance  ( 1  -  a  )X.  We  note  in  particular  that  the  attenuation 
factor  is  all  the  more  favorable  that  a  is  close  to  1. 

4.  SYSTEM  DESIGN  IMPLICATIONS 

Simulations  have  been  performed  assuming  a  QPSK  constellation, 
independent  Rayleigh  channel  attenuations  and  perfect  channel 
knowledge  at  the  receiver.  Figure  2  shows  the  BER  in  the  iso¬ 
metric  case  for  various  spectral  efficiencies  a  =  1,  and  The 
curves  closely  match  the  simulation  results  using  a  realistic  num¬ 
ber  of  subchannels  ( N  =  256).  The  ’’Gaussian  Channel”  curve  is 
provided  as  a  reference  and  corresponds  to  H,v  —  In-  In  this  situ¬ 
ation,  the  receiver  output  SINR  is  easily  shown  to  be  1  /X. 

In  order  to  determine  the  optimal  amount  of  redundancy  that 
should  be  spent  on  Linear  Precoding,  the  throughput  of  an  LP 
OFDM  system  equipped  with  a  MMSE  receiver  is  analyzed.  The 
throughput  y{o.,X)  is  the  total  number  of  bit/s/Hz  that  can  be  re¬ 
liably  transmitted  with  this  system.  It  is  defined  by  y(a,X)  — 
aC( a,X)  where  the  capacity  C(  a,X)  is  given  by  C(a.X)  = 
log2(l+SINR(a,X))  and  SINR(a, /.)  is  P  or  p,.  Note  that 
Eb/NO  =  (CX)-1  (see  [13]  for  more  details).  Figure  3  shows  the 
behaviour  of  the  optimum  value  of  a  (i.e.  for  which  the  through¬ 
put  is  maximum)  with  respect  to  Eb/No  for  both  isometric  and 

i.i.d.  cases.  For  maximizing  the  throughput,  nearly  no  redundancy 
should  be  spent  on  the  Linear  Precoder  in  the  isometric  case.  In 
contrast,  in  the  i.i.d  case,  a  significant  amount  of  redundancy  is  re¬ 
quired  when  >  4dB.  Figure  4  shows  the  maximum  throughput 
vs  Eb/No  for  isometric  and  i.i.d.  linear  precoders.  The  throughput 
for  a  Gaussian  channel  is  also  provided.  Isometric  precoding  in¬ 
creases  the  throughput  with  respect  to  i.i.d.  precoding. 

This  throughput  analysis  is  now  used  to  study  the  performance  of  a 
system  where  Linear  Precoding  of  rate  a  is  combined  with  a  clas¬ 
sical  Convolutional  Coding  of  rate  R.  Assuming  an  overall  coding 
rate  otR  of  1  /2,  the  purpose  is  to  determine  the  optimum  balance 
between  a  and  R.  Figure  3  suggests  that  when  isometric  precod¬ 
ing  is  used,  a„pt  «  1  and  with  i.i.d.  precoding,  aopt  «  2/3  for  the 
most  common  values  of  Eb/No-  The  optimum  values  of  R  are  thus 
close  to  1/2  and  to  3/4  respectively.  These  claims  are  sustained 
by  figures  5  and  6. 


5.  CONCLUSION 

This  contribution  extends  the  pioneering  work  of  [8]  devoted  to 
asymptotic  performance  analysis  of  DS-CDMA  systems  employ¬ 
ing  i.i.d.  signatures.  Here,  the  theoretical  asymptotic  SINR  at  the 
output  of  a  MMSE  receiver  with  isometric  Linear  Precoding  is  de¬ 
rived  using  new  mathematical  tools,  borrowed  from  the  so-called 
Free  Probability  theory.  It  is  shown  that  in  a  system  where  iso¬ 
metric  Linear  Precoding  is  combined  with  Convolutional  Coding, 
nearly  no  redundancy  should  be  spent  on  LP.  However,  for  Linear 

i.i.d  Precoders,  redundancy  is  required  at  the  emitter  side.  Finally, 
in  all  the  cases,  isometric  Linear  Precoders  always  outperform  i.i.d 
ones.  We  finally  remark  that  these  results  do  not  contradict  those 
of  [6]  devoted  to  the  study  of  maximum  likelihood  receivers  of 
precoded  OFDM  systems.  [6]  found  useful  the  use  of  redundant 
precoders  (a  <  1)  in  the  isometric  case  because  [6]  did  not  as¬ 
sumed  the  presence  of  an  interleaver/deinterleaver  structure.  The 
ergodicity  assumptions  (2)  are  therefore  not  valid  in  [6],  and  the 
signal  to  interference  plus  noise  ratio  converges  toward  a  value  de¬ 
pending  on  the  particular  realizations  of  random  variables  (//*)*> o- 
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Fig.  2.  Probability  of  error 
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ABSTRACT 

In  this  paper,  we  consider  the  problem  of  joint  syn¬ 
chronization  and  channel  estimation  for  orthogonal  fre¬ 
quency  division  multiplexing  (OFDM)  systems.  A  new 
algorithm  is  proposed  that  estimates  the  channel  and 
symbol  timing  simultaneously  by  using  a  technique 
based  on  maximum-likelihood  (ML)  theory  and  the 
generalized  Akaike  information  criterion  (GAIC).  Fi¬ 
nally,  we  demonstrate  the  performance  of  our  algorithm 
by  simulation  results. 

1.  OFDM 

The  OFDM  access  technology  is  based  on  the  trans¬ 
mission  of  data  packets,  each  of  which  consists  of  a 
number  of  consecutive  OFDM  symbols.  Each  OFDM 
symbol  x  has  a  length  of  N  samples  and  carries  a  cer¬ 
tain  number  of  information  bits  or  training  data  (that 
is,  known  data  that  are  used  to  assist  the  demodu¬ 
lator).  An  OFDM  symbol  is  created  by  taking  the 
discrete  Fourier  transform  (DFT)  of  N  data  symbols 
(taken  from  a  finite  constellation  A,  such  as  BPSK, 
QPSK  or  QAM).  Furthermore,  each  OFDM  symbol  is 
preceded  by  a  cyclic  prefix  (CP)  (also  called  guard  in¬ 
terval  (GI))  of  length  M  that  is  an  exact  replica  of  the 
M  last  samples  of  the  OFDM  symbol.  The  reason  for 
this  (as  will  become  apparent  below)  is  that  demodu¬ 
lation  in  the  presence  of  frequency-selective  fading  can 
be  carried  out  very  easily.  Before  proceeding,  let  us 
remark  on  the  fact  that  in  the  case  that  two  (or  more) 
identical  OFDM  symbols  are  transmitted  directly  sub¬ 
sequent  to  each  other,  the  tail  of  the  first  symbol  can 
serve  as  the  CP  for  the  second. 

In  the  WLAN  standard  recently  adopted  by  the 
IEEE  802.11  standardization  group  [1],  each  data 
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packet  consists  of  a  preamble  and  a  data  carrying  part. 
The  preamble  consists  of  10  “short”  identical  known 
OFDM  symbols  of  length  Ns  =  16  concatenated  with 
2  “long”  identical  and  known  OFDM  symbols  of  length 
Ni  =  64  which  are  all  utilized  for  carrier  frequency  off¬ 
set  (CFO)  correction,  channel  estimation  and  synchro¬ 
nization.  The  data  carrying  part  consists  of  a  variable 
number  of  OFDM  symbols  of  length  Nci  =  64,  where 
each  OFDM  symbol  contains  useful  information  plus 
some  known  pilot  bits,  which  are  typically  used  for  up¬ 
dating  the  phase  of  the  channel  estimates.  An  OFDM 
packet  for  the  IEEE  802.11  standard  is  depicted  in  Fig¬ 
ure  1.  Note  that  t\  serves  as  a  CP  for  f2>  h  is  the  CP 
for  <3,  and  so  on.  For  the  long  symbols  in  the  preamble, 
GI2  is  the  CP  for  T\  and  it  contains  the  32  last  samples 
of  T\. 

Let  y  =  [yi  ■  ■  ■  jjn]1  be  a  vector  of  N  data  sym¬ 
bols  taken  from  A  (the  elements  of  y  are  sometimes 
referred  to  as  sub-carriers)  and  let  W  be  a  DFT  ma¬ 
trix  of  size  N  x  N,  that  is,  element  k,l  of  W  is  equal 
to  W k, i  =  $  1 .  Then  the  OFDM  symbol  x 

corresponding  to  the  data  y  is  computed  by  taking  the 
inverse  DFT  of  y,  viz.  x  =  W*y,  where  (•)*  denotes 
the  conjugate  transpose.  The  CP  x  corresponding  to  x 
contains  the  M  =  16  last  samples  of  x,  a  relation  that 
can  be  expressed  as  x  —  T mW*v  where  Tm  consists 
of  the  last  M  rows  of  the  N  x  N  identity  matrix. 

Assume  that  the  effect  of  the  propagation  channel 
can  be  described  by  a  finite  impulse  response  (FIR)  fil¬ 
ter  with  an  effective  length  L  <  M  +  1  and  impulse 
response  {ho, . . . ,  frz,_i}.  For  reasons  that  will  be  ap¬ 
parent  later,  we  augment  the  channel  impulse  response 
with  M  —  L  zeros  and  define 

h  =  [ho  ■  ■  ■  hM- i]T  =  [ho  ■  •  •  hi- i  0  •  •  •  0]T 
To  illustrate  the  demodulation  procedure,  we  write  the 
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received  signal  as  (neglecting  receiver  noise) 


(conditioned  on  the  timing  T)  amounts  to  (cf.  (1)) 
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The  matrix  H  is  of  dimension  N  x  jV  and  circulant ,  so 
the  DFT  of  the  received  data  vector  r  can  be  written 

[9] 

Wr  =  WHW'y  +  noise  =  Ay  +  noise  (1) 

where  A  =  diag{<5i, . . .  ,<5jv}  is  a  diagonal  matrix  con¬ 
taining  the  DFT  of  the  channel  impulse  response  h, 
that  is,  [<5i  •  •  -  Sn]t  =  NWh.  We  remark  on  the  fact 
that  another  way  to  see  this  is  that  the  cyclic  prefix 
gives  the  effect  of  the  propagation  channel  an  interpre¬ 
tation  in  terms  of  circular  convolution;  however  we  pre¬ 
fer  to  remain  in  the  matrix  algebra  framework.  From 
(1)  we  see  that,  provided  the  channel  A  is  known,  each 
data  bit  yn  in  the  OFDM  symbol  under  consideration 
can  be  estimated  as  y„  =  Pa(^W6^}"  )  where  Pa{~)  de¬ 
notes  projection  onto  the  alphabet  A,  and  [•]„  denotes 
the  nth  element  of  a  vector.  This  common  and  simple 
demodulator  can  be  implemented  by  one  single  FFT. 


2.  ML  CHANNEL  ESTIMATION 

Channel  estimation  for  OFDM  is  discussed  in  some  de¬ 
tail  in  [2,  3,  6],  so  we  merely  summarize  some  residts 
using  our  notation  and  framework.  Assume  that  the 
received  data  r  has  been  adjusted  to  compensate  for  a 
possible  CFO  [4],  that  a  proper  timing  T  is  obtained 
and  that  the  effective  channel  length  L  is  known.  Con¬ 
sider  first  the  estimation  of  the  channel  h(L)  (we  use 
the  index  L  to  emphasize  that  the  last  M  —  L  elements 
of  h  are  zero)  based  on  a  least-squares  (LS)  criterion 
using  received  data  corresponding  to  the  first  (known) 
long  OFDM  symbol  in  the  preamble.  Denote  the  N  x  1 
vector  of  the  known  data  symbols  in  the  long  OFDM 
preamble  symbol  with  p.  Then  LS  channel  estimation 


min  ||  Wr  —  A(L)p||2 

subject  to  the  constraint  that  the  effective  channel 
length  is  L,  i.e.,  T^-j.h  =  0.  This  is  equivalent  to 

min  ||Wr  —  diag{p}W7i(L)||2  (2) 

subject  to  T M-Lh  =  0.  For  the  symbols  in  the  IEEE 
802.11  WLAN  standard,  12  of  the  elements  of  p  are 
equal  to  zero,  and  the  rest  belong  to  the  (unitary) 
BPSK  constellation.  Using  this  fact  it  is  not  difficult 
to  see  that  (2)  has  the  solution  (see,  e.g.,  [7]) 

[/U  ...  h.L]T  =  (W*W)~1W*Wr  (3) 

where  W  equals  the  matrix  W  with  all  rows  removed 
for  which  the  corresponding  element  of  p  is  zero,  and 
W  equals  the  first  L  columns  of  W.  Note  that 
( W  W)-]  W  W  can  be  precomputed  (for  different  L) 
and  further  that  in  case  the  noise  in  (1)  is  Gaussian  and 
white,  (3)  gives  the  ML  estimate  of  the  channel.  Hav¬ 
ing  established  this,  it  is  straightforward  to  show  that 
the  LS  (or  ML)  channel  estimate  based  on  both  long 
OFDM  symbols  in  the  preamble  is  nothing  but  the  av¬ 
erage  of  the  estimate  based  on  the  first  and  the  second 
symbol,  respectively. 

3.  JOINT  TIMING  AND  CHANNEL 
ESTIMATION 

Once  an  initial  timing  T\  is  obtained  that  is  less  (ear¬ 
lier)  than  the  true  timing,  the  channel  impulse  response 
will  contain  leading  zeros  (due  to  the  too  early  timing) 
and  tailing  zeros  (provided  that  the  effective  channel 
length  plus  the  synchronization  error  is  less  than  M). 
If  the  number  of  leading  and  tailing  zeros  (or  equiva¬ 
lently,  the  correct  channel  length  and  timing)  can  be 
estimated,  the  number  of  unknown  channel  coefficients 
will  decrease.  Hence  a  more  accurate  channel  estimate 
can  be  obtained,  which  will  reduce  the  bit  error  rate 
(BER.)  in  the  system  (this  is  known  as  the  parsimo¬ 
nious  principle  in  the  system  identification  literature 
[7]).  This  is  exactly  the  idea  behind  our  joint  timing, 
channel  length  and  channel  coefficient  vector  estima¬ 
tion  algorithm. 

To  obtain  the  initial  timing  estimate,  we  use  a  sim¬ 
ple  correlation  approach  (see,  e.g.,  [2])  that  exploits  the 
fact  that  the  two  long  OFDM  symbols  in  the  preamble 
are  identical.  The  initial  timing  estimate  T\  is  deter¬ 
mined  such  that  it  is  (with  a  very  large  probability) 
less  than  the  true  timing  (unless  this  is  ensured,  the 
channel  impulse  response  will  not  contain  leading  ze¬ 
ros).  Following  this,  we  refine  the  timing  estimate  at 
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the  same  time  as  the  channel  estimation  is  performed. 
The  details  of  the  procedure  are  as  follows: 

1.  Let  Ti  denote  the  sample  number  corresponding 
to  the  initial  timing  (based  on  a  correlation  approach 
[2])- 

2.  Fix  L  =  16  and  increment  the  timing  T  starting 
from  T  =  Ti  until  the  criterion  in  (2)  is  minimized.  Let 
the  so-obtained  T  be  denoted  by  T '2. 

3.  Decrease  L  starting  from  L  =  16  until  the  follow¬ 
ing  generalized  Akaike  information  criterion  (GAIC)  is 
minimized: 

\n\\Wr  -  diag{p}Wh(L)\\2  +  7L  (4) 

where  7  =  0.08  (the  rationale  behind  GAIC  are  dis¬ 
cussed  in  some  detail  in,  e.g.,  [7]).  Denote  by  Lx  the  L 
that  minimizes  (4). 

4.  Increment  T  (starting  from  T  =  T-2)  and  si¬ 
multaneously  decrease  L  (starting  from  L  =  Lx)  until 
(4)  is  minimized.  Let  the  so-obtained  final  timing  and 
channel  length  estimates  be  denoted  by  T  and  L,  re¬ 
spectively. 

Note  that  the  algorithm  is  iterative  but  terminates 
within  a  finite  number  of  steps. 

4.  PHASE  CORRECTION  BASED  ON 
PILOT  SYMBOLS 

The  received  signal  will  inevitably  suffer  from  a  CFO, 
which  can  be  estimated  and  corrected  for  using  meth¬ 
ods  such  as  those  in  [3,  2,  4],  These  methods  estimate 
the  CFO  based  on  the  received  data  in  the  preamble 
only,  and  despite  being  statistically  sound,  they  will 
never  be  perfectly  accurate.  The  remaining  CFO  er¬ 
ror  results  in  a  phase  error  that  increases  linearly  with 
time.  As  a  remedy  to  this  problem,  we  perform  an 
additional  phase  correction  for  each  OFDM  symbol  to 
compensate  for  the  (small)  remaining  CFO  error. 

Each  OFDM  symbol  contains  4  known  pilot  sym¬ 
bols.  Let  q  be  a  4  x  1  vector  of  these  pilot  symbols, 
and  let  z  be  the  corresponding  4  elements  of  the  DFT 
of  the  received  data,  i.e.,  of  Wr.  For  each  OFDM 
symbol,  we  estimate  a  channel  phase  correction  <f>  by 
minimizing  the  LS  criterion  ||z  -  qej<t> ||2  which  has  the 
solution  (j)  =  arg {q*z).  This  phase  correction  is  used  to 
obtain  a  compensated  received  signal  r  =  e~^r,  upon 
which  the  detection  of  the  data  symbols  is  based.  As 
we  illustrate  below,  this  phase  correction  can  have  a 
significant  influence  on  the  performance. 

5.  NUMERICAL  EXAMPLES 

We  provide  a  few  Monte-Carlo  simulation  results  to 
illustrate  the  effectiveness  of  our  new  algorithm.  In 


all  simulations,  we  consider  a  Rayleigh  fading  channel 
according  to  [8],  with  L  —  6  Gaussian  distributed  coef¬ 
ficients  hi  having  a  mean  power  of  of  =  £[|h/|'2]  = 
(T(je~a/  for  l  =  1  and  where  00  is  such  that 

Y^=\  °?  =  1  and  c*  =  5/3.  The  channel  is  fixed  during 
the  transmission  of  one  packet  but  independent  from 
one  packet  to  another.  A  CFO  of  0.025  Hz  is  introduced 
in  the  simulation  and  a  simple  algorithm  based  on  the 
phase  of  the  correlation  of  two  subsequent  OFDM  sym¬ 
bols  in  the  preamble  is  applied  to  estimate  and  remove 
the  CFO  error  (see,  e.g.,  [4]).  White  Gaussian  noise  is 
added  to  the  data  to  simulate  a  received  signal  with  a 
certain  ratio  of  energy  per  information  bit  to  the  spec¬ 
tral  density  of  the  noise  ( Ei/Nq ). 

Example  1:  Timing  estimation.  Figure  2  shows  the 
distribution  of  the  different  timing  estimates  Tf  (initial 
coarse  timing),  To  (refined  timing  estimate  from  Step 
2)  and  T  (final  timing  estimate).  The  true  timing  is 
T  =  194  and  Ef, /Nq  is  14  dB.  It  is  clear  from  the  figure 
that  our  algorithm  succeeded  to  recover  the  true  timing 
exactly  in  more  than  90%  of  the  realization,  and  to 
within  a  few  sample  intervals  in  virtually  all  test  cases. 

Example  2:  Estimation  of  the  effective  channel 
length.  In  Figure  3  we  show  the  distribution  of  the 
channel  length  estimates  Li  (after  Step  3)  and  L  (the 
final  channel  length  estimate).  Note  that  the  channel 
length  is  underestimated  in  most  realizations  since  the 
last  elements  of  the  impulse  response  are  usually  very 
small. 

Example  3:  Bit  error  rate  (BER)  for  QPSK  data 
symbols.  We  illustrate  the  BER  obtained  by  simulation 
of  an  IEEE  801.11  OFDM  system  using  (a)  using  our 
algorithm  without  the  additional  channel  phase  correc¬ 
tion;  (b)  using  our  algorithm  together  with  the  addi¬ 
tional  phase  correction  based  on  the  pilot  symbols;  and 
(c)  perfect  knowledge  of  the  timing,  channel  and  CFO. 
The  results  are  shown  in  Figure  4.  We  observe  from 
the  figure  that  our  synchronization  and  channel  esti¬ 
mation  algorithm  achieves  a  performance  close  to  the 
bound  provided  by  the  exact  knowledge  of  the  timing 
and  the  transmission  channel.  Furthermore,  it  is  evi¬ 
dent  that  the  usage  of  the  pilot  symbols  is  necessary  to 
fully  compensate  for  the  CFO. 

6.  CONCLUDING  REMARK 

We  have  presented  a  novel  and  conceptually  simple  al¬ 
gorithm  for  joint  synchronization  and  channel  estima¬ 
tion  for  the  IEEE  801.11  WLAN  standard.  The  algo¬ 
rithm  is  based  on  ML  estimation  and  the  GAIC  infor¬ 
mation  theoretic  criterion.  Numerical  examples  show 
that  as  far  as  the  BER  is  concerned,  our  algorithm 
achieves  a  performance  close  to  the  ultimate  bound 
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provided  by  the  exact  knowledge  of  the  transmission 
channel;  and  therefore  eliminates  the  need  for  more 
complicated  approaches  to  the  CFO,  timing  and  chan¬ 
nel  estimation. 
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Figure  2:  Distribution  of  the  timing  estimates. 


Figure  3:  Distribution  of  the  channel  length  estimates. 


Figure  4:  Simulated  BER  for  QPSK  data. 
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ABSTRACT 

In  this  paper  we  propose  and  study  a  Sub-Channel  Selec¬ 
tive  OFDM  spread  spectrum  system  and  a  distributed  al¬ 
gorithm  for  sub-channel  allocation.  The  proposed  system 
is  a  combination  of  OFDMA  and  OFDM-CDMA  systems. 
For  each  user  the  maximal  ratio  combining  weights  of  the 
sub-channels  are  used  as  channel  information.  The  algo¬ 
rithm  implements  a  simple  but  sub-optimal  version  of  the 
water  filling  power  allocation,  thus  increasing  the  transmit¬ 
ter  power  efficiency  and  decreasing  the  interference  gener¬ 
ated  for  other  users.  This  algorithm  also  provides  the  ability 
to  offer  different  Quality  of  Service  (QoS)  levels  for  differ¬ 
ent  users.  The  convergence  of  the  algorithm  and  its  perfor¬ 
mance  have  been  studied  through  simulation.  The  proposed 
system  has  significant  advantage  in  BER  performance,  over 
the  conventional  OFDM-SS  system. 

1.  INTRODUCTION 

Among  various  techniques  used  for  communication  over 
a  wireless  channel,  Orthogonal  Division  Frequency  Multi¬ 
plexing  (OFDM)  is  one  of  the  most  promising.  In  an  OFDM 
system  the  data  stream  is  divided  into  N  parallel  streams, 
which  are  transmitted  over  N  orthogonal  sub-carriers  (sub¬ 
channels).  In  the  spread  spectrum  version  of  OFDM 
(OFDM-SS,  also  known  as  Multi  Carrier  Spread  Spectrum), 
the  same  data  bit  is  transmitted  over  all  N  sub-channels. 
OFDM-SS  achieves  frequency  diversity  as  well  as  process¬ 
ing  gain.  Furthermore,  OFDM-SS  is  robust  against  jam¬ 
ming  and  interference  and  enables  us  to  use  Code  Division 
Multiple  Access  (CDMA)  as  an  efficient  Radio  Resource 
Allocation  (RRA)  scheme,  for  a  multi  user  system.  In  an 
OFDM-CDMA  system,  orthogonal  codes  of  length  N  are 
assigned  to  each  user.  These  codes  (signatures)  are  applied 
to  the  sub-channels,  to  reduce  the  interference  generated  by 
other  users. 

At  the  receiver,  the  output  of  the  sub-channels  are  com¬ 
bined  to  obtain  the  decision  variable.  For  linear  sub-channels 
with  additive  Gaussian  noise  and/or  interference,  Maximal 


Ratio  Combining  (MRC)  provides  the  optimum  Signal-to- 
Noise  and  Interference  Ratio  (SNIR)  [1],  MRC  combines 
the  output  of  the  sub-channels  by  giving  less  weight  to  sub¬ 
channels  with  low  SNIR  and  more  weight  to  those  with  high 
SNIR.  i.e.  if  sub-channel  i  is  modeled  by 

Vi  =  OiiX  +  m,  (1) 

where  ?/,  is  the  output  of  the  sub-channel,  x  is  the  data  sym¬ 
bol,  q,  is  the  sub-channel  gain  and  n,  ~  A,r(0.  a2)  is  the 
additive  Gaussian  noise,  the  MRC  decision  variable  is  given 
by 

N 

y  =  'YJwiyi,  (2) 

2=1 

where  Wi  =  a*.  In  practice,  the  sub-channel  gains  are 
unknown  and  a  Least  Mean  Square  (LMS)  or  a  Recursive 
Least  Square  (RLS)  algorithm  with  decision  feedback  is 
employed  to  estimate  these  gains  [2], 

While  MRC  performs  optimum  detection,  it  does  not 
consider  the  transmitted  power  efficiency.  We  observe  that 
a  sub-channel  with  smaller  SNIR  has  smaller  contribution 
to  the  decision  variable  compared  to  one  with  high  SNIR, 
thus  less  benefit  is  gained  from  the  transmit  power  spent  on 
the  sub-channel  with  low  SNIR.  The  solution  known  as  the 
water  filling  power  distribution  [3]  gives  the  distribution  of 
the  power  over  N  sub-channels  which  results  in  maximum 
channel  capacity  for  given  total  transmission  power.  The 
water  filling  solution  allocates  more  power  to  good  (high 
SNIR)  sub-channels  and  little  or  no  power  to  the  bad  ones. 

Several  algorithms  have  been  proposed  to  implement  the 
water  filling  solution  or  a  sub-optimal  version  of  it  for  the 
multi  carrier  system.  Optimal  power  and  symbol  size  allo¬ 
cation,  assuming  that  different  sub-channels  experience  dif¬ 
ferent  fading  and  other  channel  effects,  is  derived  in  [4],  [5] 
proposes  an  adaptive  sub-channel  and  bit  allocation  for  the 
down-link  of  a  multi  user  OFDM  system.  A  dynamic  sub¬ 
channel  allocation  for  the  down-link  of  OFDM  system  is 
proposed  in  [6]  which  assumes  quasi-static  channels.  These 
algorithms  are  computationally  complex,  and  assume  that 
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Figure  1 :  Sub-Channel  Selective  OFDM-SS  system 


the  perfect  channel  information  of  all  the  users  is  known  by 
the  base  station  at  all  times. 

In  this  paper  we  propose  a  Sub-Channel  Selective  OFDM- 
SS  (SCS  OFDM-SS)  scheme  that  implements  a  sub-optimal 
but  simple  distributed  algorithm  for  the  sub-channel  power 
allocation  in  a  multi  user  OFDM  system.  The  result  is  a 
hybrid  of  OFDM-CDMA  and  OFDMA  systems  [7][8].  An 
Orthogonal  Frequency  Division  Multiple  Access  (OFDMA) 
system  is  one  in  which  each  sub-channel  is  assigned  to  one 
user. 

The  system  is  described  in  section  2.  The  proposed  dis¬ 
tributed  algorithm  is  detailed  in  section  3,  and  in  section  4 
the  convergence  of  this  algorithm  for  specific  cases  is  stud¬ 
ied.  Section  5  presents  the  simulation  results  for  the  perfor¬ 
mance  of  the  proposed  system  and  compares  them  with  the 
performance  of  the  conventional  system.  Finally  in  section 
6  conclusions  are  drawn. 


2.  SYSTEM  DESCRIPTION 

In  the  SCS  OFDM-SS  system,  the  magnitude  of  the  MRC 
channel  weights,  |u)j|,  are  used  to  distinguish  good  sub¬ 
channels  from  bad  ones.  At  the  receiver,  M  best  sub¬ 
channels  are  selected  and  a  set  of  binary  mask  variables 
are  defined  as  to;  =  1  if  the  sub-channel  i  is  selected  and 
To;  =  0  otherwise.  These  variables  are  then  reported  to  the 
transmitter  via  a  feedback  channel.  The  transmitter  in  turn 
allocates  equal  power  to  the  selected  sub-channels  and  no 
power  to  the  unselected  ones  (Figure  1 ). 

Although  not  using  the  bad  sub-channels  in  the  combin¬ 
ing  slightly  deteriorates  the  resulting  Bit  Error  Rate  (BER), 
redirecting  the  power  spent  on  these  sub-channels  to  the 
good  ones  improves  the  over  all  BER  performance.  This 
is  similar  to  the  water  filling  algorithm  if  the  value  of  the 


assigned  power  to  each  sub-channel  is  limited  to  0  or  a  con¬ 
stant  value. 

The  more  important  benefit  of  this  algorithm,  however, 
appears  in  a  multi  user  system.  In  this  case,  a  bad  chan¬ 
nel  for  user  k  is  not  necessarily  bad  for  user  /,  thus  turning 
off  such  a  sub-channel  will  reduce  the  interference  that  user 
l  receives.  In  other  words,  the  users  try  to  avoid  the  sub¬ 
channels  with  high  level  of  interference.  This  will  result 
in  less  average  interference  for  the  users,  and  improves  the 
over  all  performance. 

Furthermore,  by  selecting  different  number  of  selected 
sub-channels  for  different  users,  different  Quality  of  Service 
(QoS)  levels  can  be  provided  for  different  users. 


3.  DROP  AND  ADD  (DA)  ALGORITHM 

When  the  channel  is  time  variable,  or  when  the  users  enter 
and  leave  the  system,  the  selected  sub-channels  must  be  up¬ 
dated  periodically.  The  period  of  the  updating  of  the  good 
sub-channels  must  be  short  enough  to  allow  the  system  to 
follow  the  changes  in  the  channel  and  long  enough  to  allow 
the  channel  weight  estimator  to  obtain  a  good  estimate. 

Since  the  channel  weight  estimator  uses  the  output  of 
the  sub-channels  to  estimate  the  weights,  channel  weights 
will  not  be  calculated  for  the  off  sub-channels  and  these  sub¬ 
channels  should  be  probed  before  the  selected  sub-channels 
are  updated.  Here  we  propose  the  Drop  and  Add  (DA)  al¬ 
gorithm.  in  which  each  user  drops  the  worst  selected  sub¬ 
channel  (minimum  |wj|)  and  adds  a  random  off  sub-channel 
to  the  selected  sub-channels  with  a  period  KT,  where  K  is 
the  total  number  of  users  in  the  system, .  If  the  new  added 
sub-channel  happens  to  be  a  bad  one  it  will  be  dropped  in 
a  later  iteration,  otherwise  the  user  has  found  a  good  sub¬ 
channel. 
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For  faster  convergence,  we  assume  that  all  users  are  syn¬ 
chronized  in  such  a  way  that  user  k  performs  the  updating 
iterations  in  times  tn  =  nKT  +  kT,  n  =  0, 1, such  that 
the  next  user  has  at  least  T  seconds  to  adapt  to  the  change. 

4.  CONVERGENCE  OF  THE  DA  ALGORITHM 

The  system  can  be  modeled  as  a  Markov  Chain,  where  each 
state  will  describe  a  specific  sub-channel  selection  for  all 
the  users.  Thus  the  number  of  states  in  this  Markov  Chain 
will  be 

Hmc  =  £  (  Mk  )  ’  (3) 

where  Mk  is  the  number  of  selected  channels  for  user  k. 
Furthermore,  since  different  users  have  different  channel 
gains,  the  convergence  analysis  for  such  a  system  is  com¬ 
plicated  even  for  small  number  of  users  in  the  network. 

In  this  paper  we  have  studied  the  convergence  of  the  DA 
algorithm  through  simulation.  A  Gaussian  model  has  been 
used  for  the  interference.  Over  time,  the  average  SNIR  of 
the  users  has  been  compared  to  the  optimum  case. 

Figure  2  shows  this  comparison  for  a  system  with  K  = 
4  users  with  flat  channels  (equal  sub-channel  gains),  N  = 
64  sub-channels  and  Mk  =  16 ,k  =  1,...,4  and  Figure 
3  shows  the  comparison  with  K  =  2  users  with  different 
frequency  selective  channels,  with  N  =  64  sub-channels 
and  Mk  =  32,  Ar  =  1, 2.  For  simplicity,  these  values  have 
been  chosen  to  satisfy  N  =  ^  Mk,  which  means  that  sub¬ 
channel  distributions  exist  which  allow  all  users  to  commu¬ 
nicate  without  any  interference.  In  general  £  Af*  can  be 
larger  than  N  which  will  result  in  some  overlap  between 
the  selected  sub-channels  of  different  users.  Perfect  syn¬ 
chronization  and  exact  channel  weight  estimation  has  been 
assumed.  Since  in  the  down  link  the  original  and  the  in¬ 
terference  signal  have  the  same  path,  all  sub-channels  have 
the  same  SNIR  and  will  be  equally  preferred  by  the  selec¬ 
tion  process.  In  other  words  the  down  link  can  be  modeled 
considering  same  (flat)  channels  for  all  users.  On  the  other 
hand,  for  the  up-link,  the  original  and  the  interference  sig¬ 
nals  have  different  paths,  thus  the  channels  used  for  the  orig¬ 
inal  and  the  interference  signals  can  be  frequency  selective 
and  different. 

In  both  cases  it  can  be  seen  that  although  the  average 
SNIR  resulting  from  the  DA  algorithm  does  not  converge  to 
the  optimum  value,  the  resulting  average  SNIR  is  close  to 
optimum  after  a  few  iterations. 

5.  BER  PERFORMANCE  RESULTS 

The  BER  performance  of  the  SCS  OFDM-SS  system  has 
been  studied  using  Monte  Carlo  simulations  of  the  system. 
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Figure  2:  Convergence  of  the  DA  algorithm  for  AT  —  4  users 
with  flat  channels  (down-link). 


Time/T 


Figure  3:  Convergence  of  the  DA  algorithm  for  K  =  2  users 
with  different  frequency  selective  channels  (up-link). 


For  these  simulations  it  has  been  assumed  that  the  synchro¬ 
nization  and  the  channel  weight  estimates  are  perfect.  Also 
it  has  been  assumed  that  perfect  power  control  exists,  there¬ 
fore  the  original  and  interference  signals  have  equal  power. 
A  value  of  N  =  64  sub-channels  has  been  used,  and  a  fair 
distribution  of  the  sub-channels  is  considered,  i.e.  Mk  = 
N/K,  k  =  1, ...,  K.  Also  random  spreading  codes  (signa¬ 
tures)  have  been  used. 

Figure  4  shows  the  BER  performance  of  each  user  for 
the  down-link  (flat  channels)  of  a  SCS  OFDM-SS  system 
with  K  =  4  users.  The  result  has  been  compared  with  the 
BER  performance  in  an  equivalent  conventional  OFDM-SS 
system.  Figure  5  compares  the  BER  performance  of  the 
SCS  and  conventional  systems  for  the  uplink  (frequency 
selective  channels)  for  K  =  2  user  case.  The  frequency 
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Figure  4:  Comparison  of  BER  performance  of  the  down  link 
of  the  SCS  and  conventional  OFDM-SS  systems  for  K  —  4 
users. 


Figure  5:  Comparison  of  BER  performance  of  the  up  link 
of  the  SCS  and  conventional  OFDM-SS  systems  for  K  —  2 
users. 


selective  channels  have  been  modeled  by  two  rays,  with 
equal  power  and  different  delays.  The  second  ray  delays 
of  Di  =  l6Tb/N  and  D 2  =  4 Tb/N  has  been  used  for  users 
1  and  2  respectively. 

In  both  cases  we  can  see  that  the  SCS  OFDM-SS  system 
has  a  strong  advantage  in  BER  performance  compared  to  the 
conventional  OFDM-SS  system. 

6.  CONCLUSIONS 

A  simple,  sub-optimal  distributed  algorithm  for  sub-channel 
allocation  in  a  OFDM-SS  system  has  been  proposed  and 
simulated.  The  convergence  of  the  algorithm  has  been  stud¬ 
ied.  Also  the  overall  BER  performance  of  the  system  has 
been  obtained  and  compared  to  that  of  the  conventional  sys¬ 
tem.  The  results  show  significant  improvement  in  the  BER 
performance  of  the  users. 
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ABSTRACT 

A  blind  second  order  statistical  (SOS)  subspace  based  channel 
identification  and  equalization  technique  is  introduced  and  investi¬ 
gated  for  bandwidth  efficient  Orthogonal  Frequency  Division  Mul¬ 
tiplexing  (OFDM)  systems.  A  suitable  zero-forcing  linear  equal¬ 
izer  (ZF-LE)  is  also  proposed.  Simulations  show  that  identifica¬ 
tion  and  equalization  is  possible  with  only  a  small  number  of  short 
length  OFDM  symbols. 

1.  INTRODUCTION 

In  OFDM  systems,  a  cyclic  prefix  (CP)  is  used  to  combat  inter¬ 
block-interference  (IBI)  caused  by  the  multi-path  channel.  Addi¬ 
tionally,  the  CP  allows  for  a  simple  one  tap  equalization  scheme. 
However,  due  to  the  extra  symbols  required  by  the  CP,  the  OFDM 
spectrum  is  underutilized.  This  overhead  can  be  significant  for 
channels  with  long  impulse  responses  and  short  block  transmission 
formats.  One  may  omit  the  CP  in  an  attempt  to  raise  the  spectral 
efficiency  but  at  the  expense  of  increased  receiver  complexity. 

A  number  of  techniques  [9, 4,  8]  have  been  proposed  for  spec¬ 
trally  efficient  OFDM  systems  which  do  not  use  a  CP.  However, 
it  appears  that  the  only  technique  that  has  so  far  been  proposed 
to  estimate  and  equalize  the  channel  characteristics  in  a  spectrally 
efficient  OFDM  system  is  the  blind  iterative  block  technique  in 
[8].  This  paper  proposes  a  new  channel  estimation  and  equaliza¬ 
tion  scheme  which  has  a  number  of  advantages  over  the  scheme  in 
[8], 

The  contributions  of  this  paper  are  as  follows: 

(i)  A  new  subspace  based  blind  channel  estimator  for  CP-free 
OFDM  is  developed.  The  main  idea  is  based  upon  oversampling 
the  channel  output  in  the  spatial  domain  by  using  multiple  anten¬ 
nas  and  then  estimating  the  channel  based  on  the  second  order 
statistics  (SOS)  of  the  received  signal.  (By  using  spatial  diversity, 
one  can  achieve  enhanced  performance  without  additional  power 
or  bandwidth  consumption.) 

(ii)  It  is  shown  how  the  proposed  estimator  takes  into  account 
known  zeros  in  the  input  stream  for  channel  identification.  (These 
known  zeros  in  the  input  stream  are  referred  to  as  virtual  carri¬ 
ers,  and  are  sometimes  used  in  practical  OFDM  systems  [7].)  The 
performance  of  the  new  method  is  compared  with  the  channel  sub¬ 
space  (CS)  method  of  Moulines  et  al.  [6]  and  the  scheme  in  [8] 
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through  computer  simulations.  The  method  is  shown  to  outper¬ 
form  existing  methods  if  the  number  of  short  length  OFDM  sym¬ 
bols  is  small.  The  technique  appears  to  achieve  a  good  trade  off 
between  estimation  accuracy  and  receiver  complexity. 

(iii)  A  matching  zero-forcing  linear  equalizer  (ZF-LE)  is  de¬ 
veloped. 

It  is  noted  that  the  channel  estimation  and  equalization  scheme 
in  this  paper  requires  (i)  N  >  2L  and  (ii)  the  sub-channels  to  have 
no  common  zeros.  Here,  N  denotes  the  OFDM  symbol  length  and 
L  is  an  upper  bound  on  the  order  of  the  sub-channels. 

2.  BANDWIDTH  EFFICIENT  OFDM  BASEBAND  MODEL 
BASED  ON  SPATIAL  DIVERSITY  AND  BLIND  CHANNEL 
ESTIMATION 

We  consider  the  baseband  discrete  time  OFDM  system  with  multi¬ 
ple  FIR  model  arrangement  as  shown  in  Fig.  1(a).  We  assume  that 
this  multi-channel  FIR  system  arises  from  oversampling  the  chan¬ 
nel  output  in  the  spatial  domain  using  a  multiple  receiver  system 
(shown  in  Fig.  1(b))  and  consists  of  Z  sub-channels  of  length  L+ 1 
at  the  most.  The  input  signal  s(n)  and  the  additive  white  Gaussian 
noise  (AWGN)  v{n)  are  mutually  uncorrelated  and  stationary.  The 
v(n)  is  also  assumed  to  be  uncorrelated  among  channels.  Also,  we 
assume  perfect  synchronization  of  carriers  and  symbols. 

Let  us  consider  a  block  of  N  complex  valued  source  symbols 
at  the  OFDM  transmitter  which  is  given  by:  s(n)  =  [s(nN), . . .  , 
s(nN—N+l)]T .  The  elements  of  s(n)  are  assumed  to  be  i.i.d  and 
taken  from  the  complex  alphabet  V  —  {m  ,V2,...  ,  vv  }  of  size  u. 
Considering  s(n)  to  be  in  the  frequency  domain,  the  time  domain 
signal  is  generated  by  the  N  point  inverse  fast  Fourier  transform 
(IFFT)  of  the  source  symbol  block  s(n)  expressed  as:  z(n)  = 
Fj^s(n)  =  [z(nN),  z(nN  —  1), . . .  ,  z(nN  —  N  +  1)]T,  where 
Fat  =[fo,fi,...  ,  Fjv — i  ]  is  an  NxN  IFFT  matrix.  After  parallel- 
to-serial  (P/S)  conversion  and  modulation,  the  transmitted  symbols 
propagate  through  multiple  channels  with  impulse  response  vec¬ 
tors  h(r)  :=  [h(r)(0), . . .  ,  h<'r\L)]T ,r  =  0, . . .  ,  Z  —  1.  At  each 
sensor  the  received  signal  is  band  pass  filtered  and  down-converted 
to  baseband  OFDM.  The  nth  received  signal  at  each  sensor  can  be 
written  in  block  form  as:  y(r)(n)  =  [y(nN),  y(nN  —  1), . . .  , 
y(nN  —  N  +  1)]T.  We  can  relate  transmit  with  receive  blocks  as: 

y{r\n)  =  Hg^Fjvs(n)  +  H!|r)Fjvs(Ti  -  1)  +  v(r)(n)  (1) 

where  (i)  v(r)(n)  is  the  AWGN  vector  at  rth  sensor,  (ii)  Hgr)  is 
the  N  x  N  Toeplitz  matrix  with  first  column  j7t(r^(0), . . .  ,  h<r)(L), 
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Fig.  1.  Discrete  time  baseband:  (a)  transmitter  model  with  multi¬ 
ple  FIR  channels,  (b)  receiver  model  with  multiple  antennas 

0, . . .  ,  Of  and  first  row  [h(r)(0),  0, . . .  ,0]  and  (iii)  H(/  '  is  the 
N  x  N  Toeplitz  matrix  with  first  column  [0, . . .  ,  0]7  and  first 
row  [0, . . .  ,  /i(r)(L), . . .  ,  /i(r)(l)].  Due  to  the  dispersive  nature 
of  the  rth  channel,  IBI  arises  between  successive  blocks  and  ren¬ 
ders  ybO(n)  in  ( 1 )  dependent  on  both  s(n)  and  s (n  -  1).  This  IBI 
causes  loss  of  orthogonality  and  the  distinct  sub-carriers  are  no 
longer  orthogonal.  To  avoid  the  IBI,  we  consider  a  truncated  ver¬ 
sion  of  yb0(K) .  Suppose  that  the  channel  order  L  is  known  and  L 
is  smaller  than  N .  The  truncated  version  of  yU){ri)  can  be  written 
as:  y(r)(n)  =  \i/r\nN  —  L),y(r\nN  —  L—  1), . . .  ,y{T\vN  — 
iV+1)]7  .  Note  that  the  length  of  y<r)(n)  is  Q,  where  Q  =  N—L. 
Then,  y*r^(n)  depends  only  on  s(n),  not  on  s(n  —  1),  that  is 

y(r'(n)  =  z(n)  +  v(r,(?t)  =  7f('  ’F,vs(n)  +  v(r\v)  (2) 

where  ?fr)  js  t|le  Toeplitz  channel  filtering  matrix  of  size  (N  — 
L )  x  N,  with  first  row  [/t^jL), . . .  ,  h(r)( 0),  0, . . .  ,0]  and  first 
column  [/i(r)(L),  0, . . .  ,  Of.  Stacking  the  outputs  of  the  Z  chan¬ 
nels  gives: 

y(n)  =‘HFNs(n)  +  v(n)  (3) 

where  y(n)  =  [y (0)(n), . . .  ,y =  [H(0) 
ancj  —  _  [v(0)(n), . . .  ,  v(z_1,(n)]T. 

The  matrix  H  is  known  as  a  Generalized  Sylvester  Matrix, 
which  has  full  column  rank  N  under  the  conditions  [6]:  (aO) 
the  polynomials  R(!)(z)  =  ]T j=0  /i*'V  have  no  common  zero, 
(al)  Q  is  greater  than  the  maximum  degree  L  of  the  polynomials 
i.e.,  Q  >  L,  and  (a2)  at  least  one  polynomial  H(,)(z) 
has  degree  L.  The  conditions  (aO)  —  (a2)  are  assumed  to  hold 
throughout  this  paper,  that  is  TL  is  assumed  to  have  full  rank. 

Given  a  block  of  data  {y(ri)}^rn' ,  the  objective  here  is  to 
estimate  the  Z(L  +  1)  x  1  vector  h  =  [hf0)r, . . . ,  hlZ-1,rf . 


We  choose  to  collect  K  consecutive  data  vectors  |y(??.)  j£_0' 
in  a  matrix:  Y/c  :=  [y(0), . . .  ,y(A'  —  1)]  =  HFnSk  +  V/r. 
The  covariance  matrix  of  the  received  data  is  thus 

Rj,j,  =  E(YkYk)  =  HF.vR,,F}!7f"  +  R,,  (4) 

where  R,„  =  Z?(S/cS{J)  and  R,.,.  =  A( V;v- V ][■).  It  is  as¬ 
sumed  that  the  noise  is  white  (R,,,.  =  a2 1)  and  the  input  signal  is 
rich  enough  that  R  s  has  full  rank,  i.e.,  rank( Rss)  =  N.  As  in 
[6],  the  EVD  of  R„„  is  expressed  as 

Kyy  =  Sdiag(Ao, . . .  ,X,\-\)Sh+(j2GG11  (5) 

where  S  =  [So, . . .  ,  S,v_i]  and  G  =  [Go, . . .  ,  G^(jv-L)-Af-i]- 
The  columns  of  S  span  the  signal  subspace,  while  those  of  G,  the 
noise  subspace.  The  columns  of  H  also  span  the  signal  subspace 
and  thus  by  orthogonality,  we  have: 

G"-H  =  0,  0  <  i  <  Z(N  —  L)  —  N  —  1.  (6) 

This  expression  is  different  from  that  in  the  CS  method  because  we 
calculate  RyV  based  on  y(n)  and  not  on  y (n).  However,  it  can  be 
solved  in  the  least  squares  sense,  as  in  the  CS  method  to  uniquely 
identify  h. 

In  practical  OFDM  systems,  some  sub-carriers  are  not  modu¬ 
lated  [7],  These  virtual  carriers  are  not  used  for  data  transmission 
but  are  usually  introduced  inside  the  roll-off  region  (to  create  a  null 
guard  interval)  to  avoid  aliasing  effects  on  data  symbols  when  the 
system  operates  over  multi-path  propagation  channels  [2], 

We  assume  the  presence  of  Ar  -  P  virtual  carriers  at  the  tail 
end  of  each  OFDM  symbol.  Thus  each  OFDM  symbol  consists 
of  P  modulated  source  symbols  and  N  —  P  non-modulated  sym¬ 
bols.  The  IFFT  matrix  F.v  thus  reduces  to  a  partial  N  x  P  matrix 

F,y  =  [f0.f, _ _  fp_i].  The  removal  of  N  -  P  columns  of  Fn 

correspond  to  the  N  —  P  virtual  carriers  in  s(7 1).  The  received 
data  model  is  then  given  by: 

y  (n)  =  WFns  (n)  +  v(n)  (7) 

where  s(?j)  denotes  the  new  data  vector  of  reduced  length  P.  Un¬ 
der  the  data  model  (7),  the  resulting  equation  (6)  no  longer  has  a 
unique  solution  because  rank(Rsf)  =  P.  (This  is  in  contrast  to 
ranfc(Rs.,)  =  N  in  case  of  no  virtual  carriers.)  In  the  following 
section,  the  corresponding  adjustments  which  take  into  account  the 
virtual  carriers  are  detailed. 

Following  the  analogous  steps  (4)-(5)  for  the  data  model_(7), 
the  corresponding  basis  for  the  noise  subspace  is  given  by:  G  = 
[Go,Gi,...  , Gz(jv_i,)-n-i]'  Also,  we  observe  that  the  columns 
of  HF,\  also  span  the  signal  subspace  and  thus  by  orthogonality 
we  have:  GI'  HFn  =  0.  In  practice,  since  the  output  data  vec¬ 
tors  are  noisy,  this  equation  is  solved  by  minimizing  the  quadratic 
form: 

9<h)  =  Ei= o  |  G^'PFn]2.  (8) 

Let  G'/UFk  =  hT(7,Fjv  where  Oi  is  the  Z(L  +  l)xiV 
filtering  matrix  associated  with  G,  and  can  be  obtained  by  back 
substitution.  Therefore  |  G\'Hp¥n  ‘\2—  h7,(?,FjvF[v(j/,h  and 
equation  (8)  can  thus  be  expressed  as:  q(h)  =  hHQh,  where 
Q  =  A  G&nFnG!1  and  channel  estimate  can 

thus  be  formulated  as 

h  =  arg  min  ||  h/fQh  ||2  .  (9) 

l|h||  =  l 
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This  quadratic  optimization  criterion  allows  unique  estimation  of 
h  up  to  a  scale  factor  and  h  is  thus  obtained  as  the  eigenvector 
associated  with  the  minimum  eigenvalue  of  Q. 

3.  ZERO  FORCING-LINEAR  EQUALIZER  (ZF-LE) 

In  this  section  we  present  a  ZF-LE  which  gives  linear  estimates 
of  the  input  symbols  s(n)  based  on  the  received  data  and  channel 
state  information  according  to  the  ZF  criteria. 

Let  H  is  the  estimate  of  matrix  H.  With  H  and  F,v  (or  Fat  in 
case  of  virtual  carriers)  known  at  the  receiver,  the  received  signal 
matrix  in  (3)  can  be  written  as:  y(n)  =  As (n)  +  v(n)  where 
A  =  TfFjv. 

From  the  estimation  theory,  the  continuous  valued  unbiased 
maximum-likelihood  estimate  s(n)  of  the  vector  s(n)  is  given  by: 
s(n)  =  Gzfy(n)  where  G2/,  is  the  ZF-LE  given  by  G./  =  A*, 
and  A  ’  denotes  the  Moore  Penrose  Pseudo-inverse  of  A.  (We 
observe  that  the  ZF-LE  is  the  same  as  the  MMSE-ZF-LE  in  [3].) 

Note  that  the  source  symbols  can  be  recovered  provided  A 
has  full  rank,  which  it  does  if  assumptions  (aO)  -  (a2)  hold.  The 
ZF-LE  can  be  implemented  as  shown  in  Fig.  1(b). 

4.  NUMERICAL  RESULTS 

We  provide  simulation  results  which  compare  the  performance  of 
our  algorithm  to  the  schemes  in  [6]  and  [8],  The  input  symbols 
were  generated  using  randomly  drawn  BPSK  symbols.  We  simu¬ 
lated  the  output  ofZ  =  4  receivers  and  all  evaluations  are  made  for 
a  N  =  25  carrier  OFDM  system  (unless  specified).  The  channel 
coefficients  are  chosen  as  in  [6].  Fig.2(a)  shows  the  zero  locations 
of  the  channel  set.  To  evaluate  the  channel  estimation  error,  we 
employed  the  normalized-root-mean-square-  error  (NRMSE)  (as 
defined  in  [1])  with  100  Monte  Carlo  runs. 

Simulation  Example  1.  Fig.2(b)  shows  the  estimator  perfor¬ 
mance  at  an  SNR=35  dB  as  a  function  of  the  number  of  OFDM 
symbols  (from  40-160).  We  can  see  the  performance  of  the  esti¬ 
mator  improves  with  increasing  the  number  of  symbols  and  large 
number  of  symbols  are  required  to  obtain  good  channel  estimates. 

Simulation  Example  2.  In  this  simulation  study,  we  fixed  the 
number  of  symbols  to  Ire  100  and  varied  the  SNR  from  10-40  dB. 
Fig.2(c)  shows  NRMSE  as  a  function  of  SNR.  We  observe  that  the 
estimator  performance  improves  with  increasing  SNR. 

Simulation  Example  3.  We  investigated  the  influence  of  the 
OFDM  symbol  length  N  on  the  channel  estimator  performance. 
The  SNR  is  fixed  at  35  dB.  Fig.2(d)  shows  that  the  performance 
of  the  estimator  improves  by  increasing  N,  however,  performance 
does  not  improve  much  beyond  N  =  25  for  increasing  number 
of  OFDM  symbols.  It  is  observed  that  in  contrast  to  N  —  25, 
the  estimator  performance  degrades  with  N  =  30  and  N  =  35 
for  40  OFDM  symbols.  This  indicates  that  the  proposed  channel 
estimator  is  quite  suitable  for  sufficiently  small  number  of  short 
length  OFDM  symbols. 

Simulation  Example  4.  Fig.3(a)  illustrates  the  effect  of  in¬ 
creasing  the  number  of  virtual  carriers  at  the  tail  end  of  OFDM 
symbols.  We  observe  that  with  the  small  number  of  OFDM  sym¬ 
bols,  virtual  carriers  degrade  the  estimator  performance.  It  is  also 
clear  that  this  effect  can  be  suppressed  by  increasing  the  number 
of  OFDM  symbols. 

Simulation  Example  5.  We  also  implemented  CS  method  for 
SNR=35  dB,  2  virtual  carriers  and  80  OFDM  symbols.  As  can  be 


Fig.  2.  (a)  zero  locations  (b)  Channel  error  vs  OFDM  symbols  (c) 
Channel  error  vs  SNR;  and  (d)  Channel  error  vs  OFDM  symbols 
of  different  lengths 


seen  from  Fig.3(b),  the  NRMSE  of  CS  method  increases  with  the 
increase  in  window  length. 

Simulation  Example  6.  Fig.3(c)  shows  the  performances  of 
CS  and  the  new  estimator  for  a  fixed  window  length  Q=22,  2 
virtual  carriers  and  SNR=35  dB  with  increasing  the  number  of 
OFDM  symbols.  The  new  technique  is  seen  to  be  closely  ap¬ 
proaching  the  performance  of  CS  method  for  number  of  OFDM 
symbols  greater  than  60.  It  is  also  seen  that,  the  new  technique 
outperforms  CS  method  for  number  of  OFDM  symbols  less  than 
60.  We  thus  have  the  possibility  of  complexity  trade-off:  An  in¬ 
crease  in  performance  as  well  as  identification  with  much  smaller 
number  of  symbols  can  be  achieved  with  the  new  technique  by 
providing  additional  receiver  antennas. 

Simulation  Example  7.  For  the  channel  set,  the  phase  pattern 
of  the  receiver  outputs  is  plotted  in  Fig.3(d).  Using  the  channel 
estimates  via  the  proposed  method  for  40  OFDM  symbols  at  35 
dB  ZF  equalizer  was  implemented.  The  equalized  phase  pattern  is 
shown  in  Fig.4(a). 

Simulation  Example  8.  An  interesting  point  to  compare  the 
performance  of  the  proposed  estimator  with  [8]  for  short  length 
OFDM  symbols.  Allowing  for  4  virtual  carriers,  1000  OFDM 
symbols  and  100  iterations,  Fig.4(d)  gives  an  insight  of  channel 
estimates  by  [8]  where  phase  pattern  is  obtained  by  using  corre¬ 
sponding  channel  estimates.  It  is  clear  that  channel  estimates  by 
[8]  are  unacceptable  and  equalization  is  impossible.  In  contrast,  as 
shown  in  previous  simulation  results  the  proposed  method  rapidly 
converges  with  much  smaller  number  of  short  length  OFDM  sym¬ 
bols. 

Remark  1:  The  processing  of  large  length  OFDM  symbols 
causes  significantly  large  time  delays  which  is  an  important  factor 
in  time  delay  sensitive  services.  Since  our  proposed  technique  is 
feasible  for  short  length  OFDM  symbols,  it  is  suitable  for  use  in 
wireless  local  area  networks  (WLANs)  [5], 
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5.  CONCLUSION 


Fig.  3.  (a)  Channel  error  vs  OFDM  symbols  with  different  number 
of  virtual  carriers  (b)  CS  method:  channel  error  vs  window  length 
(c)  CS  vs  new  method:  channel  error  vs  OFDM  symbols  and  (d) 
phase  pattern  before  equalization 


#1  «j tL 


Conclusion  is  that  although  Moulines’  method  works  even  if  there 
are  virtual  carriers,  this  method  can  be  modified  to  improve  the 
performance  if  virtual  carriers  are  present  and  the  number  OFDM 
symbols  is  small. 
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ABSTRACT  2.  CP-OFDM  INTERPRETATION  OF  TZ-OFDM 


Trailing  zero  OFDM  systems  replace  the  cyclic  prefix  in  OFDM 
systems  by  a  sequence  of  zeros.  However,  this  paper  shows  that 
a  cyclic  prefix  is  still  present  in  TZ-OFDM  systems.  Indeed,  it  is 
shown  that  a  TZ-OFDM  system  implicitly  works  by  first  adding 
redundancy  to  the  symbols  to  be  transmitted  (channel  coding)  and 
then  adding  a  cyclic  prefix.  This  paper  also  proves  that  the  chan¬ 
nel  coding  introduced  by  the  TZ-OFDM  system  is  spectrally  bal¬ 
anced,  meaning  that  it  maps  white  noise  to  (essentially)  white 
noise.  This  is  a  desirable  property  because  it  is  known  that  any 
coding  scheme  which  achieves  the  channel  capacity  over  an  un¬ 
known  multipath  channel  must  be  white-like.  By  introducing  the 
Cramer-Rao  Bound  as  a  figure  of  merit,  it  is  shown  that  there  exist 
channels  over  which  a  TZ-OFDM  system  performs  worse  than  an 
uncoded  OFDM  system.  The  Cramer-Rao  Bound  is  also  used  to 
explain  why  using  a  cyclic  prefix  is  desirable;  it  allows  channels 
with  otherwise  unstable  inverses  to  be  inverted  accurately. 


1.  INTRODUCTION 

Orthogonal  Frequency  Division  Multiplex  (OFDM)  systems  [2,  8] 
transmit  data  in  blocks,  with  a  cyclic  prefix  added  to  the  start  of 
each  block.  Recently,  it  was  proposed  in  [5,  7]  to  replace  the 
cyclic  prefix  by  a  sequence  of  zeros.  The  resulting  system  is  known 
as  a  Trailing  Zero  OFDM  (TZ-OFDM)  system.  The  main  advan¬ 
tage  [4]  of  TZ-OFDM  over  OFDM  is  that  the  source  symbols  can 
always  be  recovered  regardless  of  the  location  of  the  channel  zeros. 

On  the  surface,  it  may  appear  that  TZ-OFDM  and  OFDM 
(henceforth  referred  to  as  Cyclic  Prefixed  OFDM,  or  CP-OFDM 
for  short)  have  little  in  common.  The  main  contribution  of  this 
paper  is  to  show  that  TZ-OFDM,  with  one  proviso1 ,  is  actually  a 
special  case  of  channel  coded  CP-OFDM.  Moreover,  it  is  proved 
that  the  channel  coding  is  such  that  the  transmitted  signal  has  a 
flat  power  spectrum  provided  the  source  symbols  are  white.  This 
is  a  desirable  property  of  a  channel  coder  for  unknown  multipath 
channels;  if  the  location  of  the  channel  spectral  nulls  are  unknown, 
then  it  is  best  to  spread  the  transmitter  power  evenly  over  all  sub¬ 
channels  [1,  3,  6]. 


This  work  was  performed  while  the  author  was  a  Tan  Chin  Tuan  Ex¬ 
change  Fellow  at  Nanyang  Technological  University,  Singapore. 

'The  channel  coded  CP-OFDM  interpretation  results  in  2L-2  zeros  be¬ 
tween  each  block  whereas  TZ-OFDM  uses  only  L-l  zeros.  However, 
the  performance  of  these  two  systems,  as  measured  by  the  MSE  of  the 
equalised  source  symbols,  is  identical. 


The  channel  coded  CP-OFDM  interpretation  of  TZ-OFDM  is  il¬ 
lustrated  below  by  way  of  example.  Consider  transmitting  two 
blocks  of  three  symbols  each  over  a  length  L  =  2  FIR  channel  us¬ 
ing  a  TZ-OFDM  framework.  For  convenience,  the  following  two 
blocks  of  symbols  are  chosen: 

[6,  -1.5  +  ?0.866,  -1.5  -  ?0.866]T, 

[15,  -1.5  +  ?0.866,  -1.5  -  ?0.866]T.  (1) 

First,  the  blocks  are  IDFT’ed,  to  obtain: 

[1,2,3]t,  [4, 5,  6]t.  (2) 

A  trailing  zero  (just  one,  since  L  —  1  =  1)  is  now  added  to  each 
block.  However,  it  is  also  necessary  to  add  a  zero  at  the  very  start 
for  initialisation  purposes.  Therefore,  the  transmitted  sequence  is 

{0,1, 2, 3, 0,4, 5, 6,0}.  (3) 

It  is  now  shown  how  the  same  result  can  be  obtained  in  a  linearly 
precoded  CP-OFDM  framework. 

Define  the  linear  precoder  matrix 

10  0 
—0.333.J  0.789  +  0.455.7  0.211  -0.122.? 

0.333  0.333  —  0.577?  0.333  +  0.577.7  '  W 

.  0.333?  0.211+0.122?  0.789  -  0.455?. 

Then,  apply  P  to  the  two  blocks  in  (1)  to  obtain: 

P[6,  -1.5  +  ?0.866,  -1.5  -  ?0.866]t  = 

[6, -2 -2?,  2, -2 +  2?],  (5) 

P[15,  -1.5  +  ?0.866,  -1.5  -  ?0.866]t  = 

[15, -2 -5?,  5, -2 +  5?].  (6) 

Now,  encode  each  block  as  in  a  conventional  CP-OFDM  system, 
first  by  taking  the  IDFT  of  each  block  to  obtain: 

[1, 2, 3, 0]T,  [4, 5, 6, 0]T  (7) 

and  then  by  adding  a  cyclic  prefix: 

[0, 1,  2, 3, 0]T,  [0, 4,  5, 6, 0]T.  (8) 
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Now,  a  CP-OFDM  system  would  transmit 


P  in  (5)  to  be 


{0,1, 2, 3, 0,0, 4, 5, 6,0}.  (9) 


P  =  -Dp+i.-i 

P 


D 


p 

l)xp 


(10) 


However,  since  the  memory  length  of  the  channel  is  one,  there  is 
no  reason  for  transmitting  two  consecutive  zeros.  (Precisely,  no 
extra  information  is  gained  at  the  receiver  by  transmitting  two  ze¬ 
ros  rather  than  a  single  zero.)  A  TZ-OFDM  system  exploits  this 
fact  by  replacing  the  two  zeros  by  a  single  zero,  as  in  (3).  In  effect, 
it  is  using  the  zero  suffix  of  the  first  block  (introduced  by  the  chan¬ 
nel  coding  operation)  to  also  serve  as  a  cyclic  prefix  of  the  second 
block. 

The  fact  that  a  TZ-OFDM  receiver  does  indeed  use  the  zero 
suffix  of  the  previous  block  as  a  cyclic  prefix  for  the  current  block 
can  be  verified  by  observing  that,  unlike  a  CP-OFDM  receiver 
which  discards  the  guard  interval,  a  TZ-OFDM  receiver  does  not. 
Continuing  the  above  example,  a  CP-OFDM  transmitter  encodes 
the  blocks  in  (1)  as  {3, 1, 2, 3, 6, 4, 5, 6}  yet  a  CP-OFDM  receiver 
can  use  only  3  received  symbols  (those  corresponding  to  1,2,3) 
to  recover  the  data  in  the  first  block.  This  is  due  to  the  prob¬ 
lem  of  inter-block  interference  (IBI).  A  TZ-OFDM  receiver,  on  the 
other  hand,  can  use  all  4  symbols  (corresponding  to  1 , 2,3,0  and 
4,  5, 6, 0  in  (3))  to  recover  each  block.  This  is  because  the  zero 
suffix  of  the  previous  block  eliminates  IBI.  That  is,  the  receiver 
does  indeed  depend  on  the  previous  block  having  a  zero  suffix. 
Key  Point:  A  CP-OFDM  receiver  applied  to  (9)  and  modified  to 
take  into  account  the  presence  of  the  precoder  P  works  identically 
to  a  TZ-OFDM  receiver  applied  to  (3).  This  is  because  the  CP- 
OFDM  system  uses  a  transmit  block  size  of  4  (as  measured  before 
the  cyclic  prefix  is  added)  and  hence  the  receiver  uses  4  symbols 
per  received  block  (corresponding  to  1, 2, 3, 0  and  4, 5, 6, 0  in  (9) 
since  it  discards  the  guard  interval)  to  recover  the  data.  The  TZ- 
OFDM  system  uses  a  transmit  block  size  of  3  (as  measured  before 
the  trailing  zero  is  added)  and  hence  the  receiver  uses  4  symbols 
per  received  block  (corresponding  to  1,  2, 3,  0  and  4. 5,  6, 0  in  (3)) 
to  recover  the  data.  Therefore,  the  petfonnancc  of  the  TZ-OFDM 
system  can  be  analysed  by  studying  the  performance  of  the  channel 
coded  CP-OFDM  interpretation  of  it. 


3.  CHANNEL  CODING 

The  previous  section  showed  that  the  performance  of  a  TZ-OFDM 
system  is  equivalent  to  the  performance  of  a  channel  coded  CP- 
OFDM  system,  where  P  in  (4)  performs  the  channel  coding.  Since 
a  CP-OFDM  system  transmits  each  symbol  over  an  independent 
sub-channel,  the  effect  of  P  in  (5)  is  to  spread  the  3  source  sym¬ 
bols  out  over  4  sub-channels,  that  is,  it  spreads  the  spectrum  of 
the  source  symbols.  This  helps  explain  why  a  TZ-OFDM  system 
can  recover  the  source  symbols  regardless  of  the  location  of  the 
channel  zeros. 

This  section  proves  that  the  precoder  P  spreads  the  spectrum 
in  a  very  special  way;  it  maps  white  noise  to  (essentially)  white 
noise.  This  is  a  desirable  property  because  it  is  known  that  any 
coding  scheme  which  achieves  the  channel  capacity  over  an  un¬ 
known  multipath  channel  must  be  white-like. 

Consider  a  general  TZ-OFDM  system  operating  over  a  chan¬ 
nel  of  length  L  and  using  a  block  size  of  p.  In  order  to  make  (9) 
with  the  2L  -  2  zeros  between  each  block  reduced  to  only  L  —  1 
zeros  identical  to  (3),  it  is  necessary  to  choose  the  channel  coder 


where  D„  denotes  the  n  x  v  DFT  matrix.  (Substituting  p  =  3  and 
L  =  2  into  (10)  yields  (4).)  Such  a  P  has  the  following  property, 
the  proof  of  which  is  omitted. 

Theorem  1  Define  P  as  in  (10).  Then  PPU  is  a  circulant  matrix 
with  ones  along  the  diagonal. 

If  the  source  symbols  s  e  O’  are  white  (E\ss"]  =  I)  then 
the  covariance  of  the  precoded  symbols  Ps  is  E  [(Ps)(Ps)/,j  = 
PPH .  It  follows  from  Theorem  1  that,  on  average,  power  is  dis¬ 
tributed  equally  on  the  p  +  L  —  1  sub-channels  in  a  TZ-OFDM 
system.  (Recall  that  the  channel  coded  CP-OFDM  interpretation 
of  TZ-OFDM  systems  makes  it  possible  to  speak  of  independent 
sub-channels  in  a  TZ-OFDM  system.) 

Remark:  Since  P  is  a  tall  matrix,  it  is  not  possible  for  the  trans¬ 
mitted  symbols  to  be  white  (that  is.  for  PPH  =  /).  However, 
Theorem  1  shows  that  PPH  has  ones  along  the  diagonal,  which  is 
interpreted  here  as  being  “almost  white”,  or  white-like.  The  impor¬ 
tant  point  is  that  power  is  distributed  equally  on  the  sub-channels, 
a  pleasing  property  if  the  location  of  the  channel  spectral  nulls  is 
not  known. 

4.  A  PERFORMANCE  MEASURE 

OFDM  receivers  operate  on  the  received  blocks  independently  of 
each  other.  Therefore,  the  performance  of  an  OFDM  system  is 
fully  determined  once  the  performance  of  recovering  a  single  block 
is  known.  This  section  shows  how  the  Cramer-Rao  Bound  can  be 
used  to  measure  the  performance  of  a  CP-OFDM  or  TZ-OFDM 
system.  Also,  the  channel  coded  CP-OFDM  interpretation  of  a 
TZ-OFDM  system  is  used  to  construct  a  channel  h  over  which, 
somewhat  surprisingly,  a  TZ-OFDM  system  performs  worse  than 
an  uncoded  CP-OFDM  system. 

Consider  using  an  arbitrary  linear  precoder  matrix  P  6  C"  Xp 
to  encode  a  single  block  of  unknown  source  symbols  s  G  C?’  prior 
to  transmission  through  a  finite  impulse  response  (FIR)  channel 
h  =  [ho,  •  ■  •  ,  hL-iY  €  C‘- .  The  received  vector  y  €  C"-/'+1 
is  given  by 

y  =  'HPs  +  n,  n~AT(0,J)  (11) 

where  Ti  is  the  upper  triangular  (n  —  L  4-  1)  x  n  Toeplitz  chan¬ 
nel  matrix  with  first  row  equal  to  [hr,- 1 ,  •  •  •  ,  ho,  0,  •  •  •  ,0]  and  n 
denotes  additive  white  Gaussian  noise  (AWGN)  with  unit  variance 
(E[nnH]  =  I). 

Remark  1:  The  fact  that  Ti  has  fewer  rows  than  columns  is  due 
to  the  memory  of  the  channel  and  is  related  to  the  problem  of  IBI; 
without  knowing  something  about  the  symbols  transmitted  just 
prior  to  Ps  being  transmitted,  the  received  symbols  correspond¬ 
ing  to  the  first  L  —  1  symbols  of  Ps  provide  no  information  to  the 
receiver  about  s.  Similarly,  if  nothing  is  known  about  the  symbols 
transmitted  just  after  Ps  is  transmitted,  then  no  information  about 
s  is  gained  by  observing  symbols  received  after  Ps  is  transmitted. 

It  is  assumed  for  simplicity  that  the  receiver  has  perfect  knowl¬ 
edge  of  the  channel  parameters  h.  Then,  for  any  unbiased  estimate 
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s  of  s,  the  error  covariance  matrix  jE[(.s  —  s)(s  —  sf  ]  is  lower 
bounded  by  the  Cramer-Rao  Bound  (CRB) 

r  =  (pH  nH  npy1 .  (12) 

In  fact,  this  lower  bound  is  met  with  equality  if  the  maximum- 
likelihood  (ML)  decoder 

s=  (PH'HH'Hpyi  PHUHy  (13) 

is  used. 

Remark  2:  Since  ( 1 3)  is  an  unbiased  estimate,  it  is  also  referred  to 
as  a  Zero  Forcing  (ZF)  equaliser  in  the  literature.  Since  it  achieves 
the  CRB,  it  is  the  best  possible  ZF  equaliser  for  recovering  s  from 

y- 

It  is  proposed  to  use  R  as  a  figure  of  merit  for  the  precoder  P. 
Indeed,  for  any  given  channel  h,  the  diagonal  element  Ru  is  the 
mean-square  error  (MSE)  of  the  estimate  of  the  ith  element  of  s  if 
the  ML-decoder  is  used,  while  the  off-diagonal  element  Rij  is  the 
correlation  between  the  estimates  of  the  ith  and  jth  elements  of  s. 
Remark  3:  The  CRB  ignores  the  fact  that  practical  communi¬ 
cation  systems  transmit  symbols  coming  from  a  finite  alphabet. 
However,  it  is  clear  that  if  the  elements  of  R  are  large  then  the  bit 
error  rate  of  a  practical  system  using  the  linear  precoder  P  will  be 
adversely  affected. 

The  relevance  of  (11)  and  (12)  to  practical  CP-OFDM  and 
TZ-OFDM  systems  is  now  explained.  A  CP-OFDM  system  first 
IDFT’s  the  block  and  then  adds  a  cyclic  prefix  of  length  L  — 
1.  Since  a  CP-OFDM  receiver  operates  on  blocks  separately,  it 
does  not  keep  any  information  about  the  previously  decoded  block. 
Therefore,  the  first  L- 1  received  symbols  of  the  current  block  pro¬ 
vide  no  information  about  the  transmitted  symbols  (see  Remark  1 
above)  and  are  discarded  by  the  receiver.  Thus,  for  a  CP-OFDM 
system,  if  P  =  CDH  where 

q=  0(£_i)x(n_2L+2)  h-i  (14. 

In-L+l 

adds  a  cyclic  prefix  and  D  is  a  DFT  matrix,  then  (11)  correctly 
models  all  the  information  available  to  the  receiver  about  the  cur¬ 
rent  block  s,  and  hence  (12)  is  the  best  achievable  performance  of 
any  unbiased  equaliser  in  a  CP-OFDM  system. 

A  TZ-OFDM  system  first  IDFT’s  the  block  and  then  appends 
L—  1  trailing  zeros.  Unlike  in  a  CP-OFDM  system  though,  the  first 
L—l  received  symbols  of  the  current  block  do  provide  information 
about  the  transmitted  symbols.  This  is  because  the  receiver  knows 
that  L—  1  zeros  (corresponding  to  the  trailing  zeros  of  the  previous 
block)  were  transmitted  just  prior  to  the  current  block!  In  order  to 
incorporate  this  extra  information  into  (1 1),  it  is  necessary  to  use  a 
P  different  from  expected.  Specifically,  the  correct  P  is  one  which 
first  IDFT’s  the  block  and  then  adds  both  L  —  l  leading  zeros  and 
L  —  l  trailing  zeros  to  the  block.  Then,  (11)  correctly  models  all 
the  information  available  to  the  receiver  about  the  current  block  s, 
and  thus  (12)  is  the  best  achievable  performance  of  any  unbiased 
equaliser  in  a  TZ-OFDM  system. 

Remark  4:  The  simple  rule  is  that  P  in  (1 1)  is  chosen  so  that  y 
contains  all  the  information  available  to  the  equaliser  in  a  single 
received  block. 

Remark  5:  An  alternative  way  of  deriving  the  correct  P  to  use  in 
(1 1)  to  model  a  TZ-OFDM  system  is  to  use  the  channel  coded  CP- 
OFDM  interpretation.  Then,  it  is  clear  that  P  =  CDH  P  for  ap¬ 
propriately  sized  cyclic  prefix  matrix  C  and  DFT  matrix  D.  Here, 
P  is  defined  as  in  (10). 


Although  a  TZ-OFDM  system  performs  better  than  an  un¬ 
coded  CP-OFDM  system  over  most  channels  h,  the  proof  of  the 
following  theorem  uses  the  channel  coded  CP-OFDM  interpreta¬ 
tion  of  a  TZ-OFDM  system  to  construct  a  channel  h  over  which 
a  TZ-OFDM  system  performs  worse  than  an  uncoded  CP-OFDM 
system. 

Theorem  2  Consider  sending  a  single  block  ofp  symbols  over  an 
FIR  channel  of  length  L,  using  either  a  CP-OFDM  precoder  P\  of 
size  (p+L  —  1 )  xp  or  a  TZ-OFDM  precoder  P2  of  size  ( p  +  2L  — 
2)  xp.  Here,  Pi  =  CDH  where  C  is  a  (p+L—1)  xp  cyclic  prefix 
matrix  and  D  a  DFT  matrix,  while  P2  =  C[D  OpX(£-i)f  where 
C  is  now  a  (p  +  2L  —  2)  x  (p  +  L  —  1)  cyclic  prefix  matrix.  Let 
Ri  and  R  >  be  the  associated  CRB  matrices,  defined  in  (12),  for  a 
given  channel  vector  h.  Then,  there  exist  values  ofp,  L  and  hfor 
which  tr  {Pi }  <  tr  {P2},  meaning  that  a  CP-OFDM  system  can 
sometimes  perform  better  than  a  TZ-OFDM  system. 

PROOF.  Choose  p  =  3  and  L  =  3.  The  TZ-OFDM  system 
spreads  the  p  =  3  symbols  over  p+L—1  =  5  sub-channels. 
Choose  h  to  have  spectral  nulls  on  the  2nd  and  5th  sub-channels, 
that  is,  h  =  [1  —0.618  1]T.  Then  tr  {Pi}  =  1.29  and  tr  {P2}  = 
1.88.  □ 

It  is  emphasised  that  Pi  and  P2  are  chosen  in  Theorem  2  so 
that  (13)  correctly  models  the  best  CP-OFDM  equaliser  and  the 
best  TZ-OFDM  equaliser  respectively,  where  best  means  the  min¬ 
imum  variance  unbiased  equaliser  based  on  all  the  available  in¬ 
formation  at  the  receiver.  Since  the  CP-OFDM  equaliser  discards 
the  guard  interval  whereas  the  TZ-OFDM  equaliser  does  not,  it  is 
necessary  for  P2  to  have  more  rows  than  Pi . 

5.  IMPORTANCE  OF  CYCLIC  PREFIX 

For  completeness,  this  section  mentions  that  the  cyclic  prefix,  be¬ 
sides  allowing  individual  data  symbols  to  be  transmitted  over  inde¬ 
pendent  sub-channels,  serves  another  important  purpose.  A  cyclic 
prefix  allows  channels  with  otherwise  unstable  inverses  to  be  in¬ 
verted  accurately.  This  is  readily  demonstrated  with  the  aid  of  the 
CRB  (12). 

Consider  sending  the  symbols  1,  2,3  over  a  length  L  =  2 
FIR  channel  by  first  precoding  the  symbols  to  form  one  of:  (A) 
0, 1, 2,  3,  (B)  1,  2,  3,  0,  or  (C)  3, 1,  2, 3.  Consider  the  following 
four  test  channels: 

hi  =  [1  Of,  h2  =  [0  if,  hs  =  [1  -  5f ,  h4  =  [-5  if. 

(15) 


The  resulting  CRB,  given  by  P  in  (12),  can  be  calculated  for  any 
combination  of  precoder  and  channel.  Of  most  interest  are  the 
following  combinations: 


1 

5 

25 

[  651 

130 

25  ' 

Ra3  =  5 

26 

130 

,  Pb4  =  130 

26 

5 

25 

130 

651 

l  25 

5 

1 

(16) 


where  Ra3  denotes  the  combination  of  precoder  A  and  channel 
/13,  and  similarly  for  Pb4-  Furthermore,  the  CRB  is  not  defined 
(implying  that  not  all  the  symbols  can  be  recovered)  if  precoder 
A  is  used  over  channel  2,  or  if  precoder  B  is  used  over  channel 
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1.  It  can  be  verified  that  all  other  combinations  lead  to  reasonable 
values  of  R.  For  example. 


Res  =  Rc4  = 


0.04 

0.01 

0.01 

0.01 

0.04 

0.01 

0.01 

0.01 

0.04 

(17) 


There  is  a  simple  explanation  for  (16).  The  channel  hz  is 
non-minimum  phase,  and  hence  has  an  unstable  inverse.  This  is 
reflected  by  the  exponential  growth  in  the  diagonal  elements  of 
R.A3.  Precoder  B  puts  a  known  symbol  at  the  end.  and  hence  per¬ 
forms  poorly  over  channels  which  have  an  unstable  inverse  when 
run  backwards  (such  as  any  non-maximum  phase  channel).  Since 
a  channel  generated  at  random  has  a  reasonable  chance  of  being 
non-minimum  phase,  it  is  clear  that  Precoder  A  is  unsuitable  for 
use  in  practice,  and  similarly  for  Precoder  B. 

Remark:  Note  that  using  Precoder  B  in  (11)  does  not  model  a 
TZ-OFDM  system.  The  correct  precoder  to  use  in  ( 1 1 )  so  as  to 
model  a  TZ-OFDM  system  is  Pi  in  Theorem  2,  and  in  particular, 
it  will  become  clear  that  the  performance  of  TZ-OFDM  systems  is 
not  affected  by  non-minimum  phase  channels. 

This  observation  can  be  generalised  as  follows.  Any  precoder 
which  does  not  make  the  last  L—  1  transmitted  symbols  in  a  block2 
a  known  function  of  the  first  L  —  1  transmitted  symbols  will  per¬ 
form  badly  if  the  channel  is  non-minimum  phase.  This  is  because 
errors  introduced  by  noise  near  the  start  of  the  block  build  up  expo¬ 
nentially  in  magnitude  and  go  unchecked  unless  the  receiver  is  able 
to  reconcile  the  last  L  - 1  symbols  with  their  true  values  (both  Pre¬ 
coders  B  and  C  allow  this  reconciliation,  for  instance).  Similarly, 
if  the  first  L  —  1  transmitted  symbols  are  not  a  known  function  of 
the  last  L  —  1  transmitted  symbols  in  a  block,  then  channels  which 
are  non-maximum  phase  will  lead  to  an  exponential  growth  of  er¬ 
rors  in  the  reverse  direction  (as  shown  by  Rb  i).  A  cyclic  prefix 
precoder  is  distinguished  by  the  fact  that  it  satisfies  both  proper¬ 
ties;  the  first  L  —  1  symbols  of  each  block  are  a  function  of  the 
last  L  —  1  symbols,  and  the  last  L  —  1  symbols  of  each  block  are 
a  function  of  the  first  L  -  1  symbols.  Therefore,  a  cyclic  prefix 
prevents  an  exponential  growth  of  errors  regardless  of  the  phase  of 
the  channel. 

In  fact,  the  key  property  of  the  cyclic  prefix  is  that  its  per¬ 
formance  is  invariant  to  the  phase  of  the  channel  spectrum  (see 
Theorem  3  below,  whose  straightforward  proof  is  omitted),  and 
moreover,  it  keeps  this  property  regardless  of  what  other  linear  op¬ 
erations  are  performed  prior  to  adding  the  cyclic  prefix.  Note  too 
that  a  cyclic  prefix  is  the  most  efficient  way  of  attaining  the  phase 
invariance  property  because  a  necessary  condition  for  the  inverse 
in  (12)  to  exist  is  for  the  precoder  to  introduce  at  least  L  -  1  re¬ 
dundant  symbols. 


Theorem  3  In  (11),  assume  that  P  factorises  as  P  =  C  A  where 
A  €  C(n-L+1)xp  is  an  arbitrary  linear  precoder  and  C  is  the 
cyclic  prefix  matrix  defined  in  (14).  Then,  the  CRB  R,  defined 
in  (12),  depends  on  h  only  through  the  magnitude  of  the  channel 
spectrum,  and  in  particular,  is  invariant  to  the  phase  of  the  channel 
spectrum. 


2Here,  “block”  must  be  interpreted  with  care.  In  the  notation  of  Sec¬ 
tion  4,  it  refers  to  Ps.  In  this  sense  then,  a  TZ-OFDM  block  has  L  -  1 
zeros  at  the  start  and  at  the  end. 


6.  CONCLUSION 

The  standard  description  of  a  TZ-OFDM  system  as  a  CP-OFDM 
system  with  the  cyclic  prefix  replaced  by  a  null  guard  provides  lit¬ 
tle  insight  into  the  performance  of  TZ-OFDM  systems.  This  paper 
showed  that  the  performance  of  TZ-OFDM  systems  can  be  under¬ 
stood  by  considering  an  equivalent  channel  coded  CP-OFDM  sys¬ 
tem.  Since  CP-OFDM  systems  are  particularly  simple  to  under¬ 
stand  —  they  transmit  each  data  symbol  over  an  independent  sub¬ 
channel  —  this  interpretation  is  believed  to  be  an  attractive  way  of 
understanding  TZ-OFDM  systems.  Indeed,  using  this  interpreta¬ 
tion,  this  paper  proved  that  TZ-OFDM  systems  spread  the  power  of 
the  source  symbols  equally  across  the  independent  sub-channels. 
Furthermore,  this  intuition  led  to  the  discovery  of  a  channel  for 
which  a  TZ-OFDM  system  performs  worse  than  an  uncoded  CP- 
OFDM  system.  Finally,  for  completeness,  the  need  for  a  cyclic 
prefix  (or  leading  and  trailing  zeros)  was  explained  in  terms  of 
being  able  to  invert  accurately  non-minimum  and  non-maximum 
phase  channels. 
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Abstract — Fast  analog  to  digital  conversion  with  only 
one  bit  per  sample  does  not  only  make  high  sampling  rates 
possible  but  also  reduces  the  required  hardware  complex¬ 
ity.  For  short  data  buffers  or  block  lengths,  it  has  been 
shown  that  tone  frequency  estimators  can  be  implemented 
by  a  simple  table  look-up.  In  this  paper  we  present  an  anal¬ 
ysis  of  such  tables  using  the  Hadamard  transform.  As  an 
outcome  of  the  analysis,  we  propose  a  class  of  nonlinear  es¬ 
timators  of  low  complexity.  Their  performance  is  evaluated 
using  numerical  simulations.  Comparisons  are  made  with 
the  proper  Cramer-Rao  bound  and  with  the  table  look-up 
approach. 

1.  INTRODUCTION 

Tone  frequency  estimation  from  an  jV-sequence 

{x[0], . . .  ,x[N  —  1]}  (1) 

of  noise  corrupted  data  is  a  well-established  research  area 
and  several  estimators  have  been  proposed  during  the  past 
decades.  In  this  paper,  we  consider  the  signal  model 

x[n]  =  s[n]  +  e[n],  s[n]  =  Asin(27r/n  +  4>)  (2) 

where  A  >  0  is  the  amplitude,  <j>  the  initial  phase,  and  / 
is  the  normalized  frequency,  0  <  /  <  1/2,  i.e.  f  =  F/fs 
where  F  is  the  signal  frequency  and  fs  is  the  sampling  fre¬ 
quency.  The  frequency  /  is  an  unknown  parameter  and  the 
phase  <j>  is  assumed  to  be  uniformly  distributed  over  the  in¬ 
terval  [0, 2 7r]  (and  independent  of  other  signal  parameters). 
The  noise  is  assumed  white  Gaussian  with  variance  cr2. 

We  make  the  assumption  that  the  observed  data  y[n]  is 
a  quantized  version  of  x[n ]  forming  a  binary  sequence 

(j/[0],  •  •  -  ,y[N  —  1]}  (3) 

according  to 

y[n)  =  sign(x[n])  (4) 

This  work  was  supported  in  part  by  the  Junior  Individual  Grant  Pro¬ 
gram  of  the  Swedish  Foundation  for  Strategic  Research. 


where  sign(x)  =  1  for  x  >  0  and  sign(x)  =  -1  for  x  <  0. 
In  an  electronic  circuit  we  would  represent  such  binary  data 
by  ones  and  zeros. 

We  are  interested  in  estimators  that  strive  to  estimate 
the  true  value,  say  /o  (a  deterministic  constant),  of  the  un¬ 
known  frequency  /,  based  on  a  binary  sequence  of  the  ob¬ 
served  data  according  to  (3).  Our  goal  is  to  find  an  estimator 
/  :  {±1}A  —  R,  operating  on  the  observed  and  quantized 
data  and  optimal  in  the  sense  of  minimum  mean  square  er¬ 
ror  (MMSE).  That  is,  we  strive  to  find  the  estimator  that 
minimizes  E[(/  —  /)2]  subject  to  an  assumed  a  priori  dis¬ 
tribution  for  the  unknown  frequency  /.  That  is,  the  a  priori 
distribution  for  the  frequency  is  a  design  parameter  of  the 
estimator. 

Because  of  the  quantization,  the  number  of  possible  dif¬ 
ferent  sequences  (3)  is  finite.  Hence,  a  particular  observed 
sequence,  of  length  N,  can  always  be  mapped  to  an  index 
i  6  {0, ....  M  -  1},  with  At  =  2N,  where  we  chose  the 
mapping  from  an  observed  sequence  to  the  index  i  as 

N-l  ,  r  , 

(5) 

n=0 

Since  there  is  only  a  finite  number  of  possible  observed  se¬ 
quences,  there  is  also  a  finite  number  of  possible  estima¬ 
tor  outputs.  Thus  any  estimator  can  be  implemented  in  two 
steps:  (a)  determine  the  index  i  that  corresponds  to  the  ob¬ 
served  sequence  according  to  (5),  and  (6)  use  this  index  as  a 
pointer  to  an  entry  in  a  table 

{/(0) ,  /(l) , . . . ,  f(M  -  1)}  (6) 

containing  all  possible  frequency  estimates.  Under  the  MMSE 
criterion  we  have  that  the  table  entries  should  be  chosen  as 

/(*)  =  E[/|  <].  (7) 

where  the  expectation  is  with  respect  to  the  assumed  a  pri¬ 
ori  distribution  for  /,  the  phase  and  the  noise,  conditioned 
on  the  observed  sequence  (as  represented  by  the  index  i). 

In  [1]  we  studied  methods  for  computing  estimator  tables 
(6)  based  on  (7).  We  also  investigated  the  performance 
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of  the  resulting  MMSE  estimator.  As  demonstrated  in  [1], 
table  based  frequency  estimation  performs  well  compared, 
e.g.,  with  the  Cramer-Rao  bound  for  one-bit  quantized  data 
[2],  However,  the  size  of  the  table  grows  exponentially  with 
the  block-length  N,  and  the  method  is  hence  not  feasible 
for  block-lengths  larger  than,  say,  24-26  samples.  Our  aim 
in  the  present  study  is  therefore  to  investigate  methods  to 
compress  the  table,  that  is,  characterizing  the  set  of  possible 
estimates  /  using  (much)  less  than  2N  table  entries.  Our 
main  tool  in  achieving  such  compression  is  the  Hadamard 
transform,  as  explained  next. 

2.  THE  HADAMARD  TRANSFORM 
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We  note  that  any  function  7  :  {0, ....  M  —  1}  — >  R,  where 
M  =  2a  and  with  a  finite  domain  represented  by  the  inte¬ 
gers  {0, . . . ,  M  -  1},  can  be  expanded  as 


7(1)  =  tT  h((),  with 


h(0  = 


1 


y[N  -  1], 


y[°] 
vl  1] 

2/[<%[l] 


N-l 


n 

_n=0 


where  <g>  denotes  the  Kronecker  matrix  product  and  with  the 
relation  between  the  index  i  and  the  binary  variables  {y[n}} 
defined  as  in  (5).  The  vector  t,  with  elements  {(„,},  is  then 
the  Hadamard  transform  of  g  =  [  7(0)  •  •  •  7 (M  -  1)  ]T  , 
computed  as 


t  =  2_iVHg 


(9) 


where  H  is  the  size  M  x  M  Hadamard  matrix ,  with  rows 
h(0),  ...,  h(M  -  1).  Computing  t,  as  in  (9),  requires 
0(NM )  operations  [3].  We  see  that  the  representation 
7 (i)  =  tTh(i)  gives  the  value  7 (i)  in  terms  of  the  “bits” 
{j/[n]}  of  the  index  i.  This  property  has  proven  to  be  of 
great  use  in  synthesis  and  analysis  of  quantizers  [4],  In  the 
application  studied  in  this  paper,  the  finite-domain  function 
of  interest  is  the  estimator  /((),  and  the  binary  variables 
{y[n]}  are  the  one-bit  quantized  data  samples  (4).  Using 
(8)  we  conclude  that  the  Hadamard  transform  can  hence  be 
employed  to  represent  this  estimator  as 


Fig.  1.  The  normalized  magnitudes  of  the  first  72  (- 
coefficients  in  (10)  for  a  fixed  table  of  size  M  =  216.  The 
coefficients  above  the  dashed  line  correspond  to  weight  two 
binary  products  of  neighboring  samples  (□),  samples  at  dis¬ 
tance  3  (o)  and  5  (V),  respectively. 

That  is,  /  can  be  represented  in  terms  of  the  transform 
coefficients  {(„,}  and  all  possible  different  products  that 
can  be  formed  using  the  variables  {y[n]}.  For  a  given  f  (i) 
the  coefficients  {(,„}  (the  (-coefficients,  for  short)  are  cal¬ 
culated  via  the  Hadamard  transform.  It  is  important  to  note 
that  the  representation  (10)  is  exact. 

We  aim  to  use  ( 1 0)  as  a  basis  for  reducing  the  number  of 
parameters  needed  in  implementing  the  estimator  /,  noting 
that  the  (-coefficients  completely  define  /.  However,  since 
there  are  M  different  tm  nothing  is  gained  by  using  (10) 
to  implement  the  estimator  (on  the  contrary  there  is  a  loss 
in  computational  complexity  since  the  sum  in  (10)  needs  to 
be  calculated,  while  a  table  look-up  implementation  based 
on  (6)  basically  requires  no  computation  at  all).  It  is  rea¬ 
sonable.  however,  to  assume  that  not  all  of  the  (-coefficients 
are  significant  (in  the  sense  that  some  of  them  are  zero  or 
close-to  zero).  Hence,  if  we  can  identify  the  (-coefficients 
that  are  most  significant  we  need  only  to  store  these  and  then 
use  (10)  (setting  “insignificant”  coefficients  to  zero)  to  com¬ 
pute  an  approximate  estimate.  Compared  with  using  a  table 
look-up  implementation  we  can  hence  use  such  an  approach 
to  trade  storage  complexity  for  computations. 

3.  TABLE  ANALYSIS 

Consider  a  known  table  used  in  a  table  look-up  frequency 
estimator  (6).  say  g.  That  is 


M— 1 


f(i)  =  tTh(i)  =  ^2  tmhmii)  =  to  +  h  y[0]  +  (2  (/[l] 

m=0 

N-l 

+  hy[0]y[l]  +  ---  +  tM-i  U  y[n}.  (10) 


71=0 


/(0),/(l) . /(A/  -  1) 


(11) 


The  table  entries  in  g  can  be  expressed  as  a  function  of  the 
corresponding  (-coefficients  and  a  binary  representation  of 
the  entry  index,  as  in  (10).  To  illustrate  the  structure  of 
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JV-bit  shift  register 


Fig.  2.  A  proposed  estimator  where  neighboring  binary 
products  of  weight  two  are  used  (type-.4). 

the  f-coefficients  we  use  a  table  (11)  trained  at  SNR  = 
A2 / (2a2)  =  20  dB  and  for  a  block-length  N  =  16,  accord¬ 
ing  to  [1].  The  f-coefficients  for  this  table  are  computed, 
as  in  (9),  and  their  normalized  magnitudes  |fj|/f0  are  dis¬ 
played  in  Figure  1 .  We  see  that  there  exist  coefficients  that 
are  significantly  larger  in  magnitude  than  the  rest  (marked 
in  Figure  1  above  the  dashed  line).  In  further  analyzing  the 
f-coefficients  we  note  that  all  the  dominant  f-coefficients 
correspond  to  a  weight  two  product  in  the  sum  (10),  i.e  f3 
is  multiplied  with  the  product  j/[0]y[l]  and  f6  is  multiplied 
with  y[l]y[2]  and  so  forth.  Further  we  can  divide  the  domi¬ 
nant  f-coefficients  into  two  sets: 

A)  f-coefficients  that  correspond  to  a  weight  two  product 
of  neighboring  samples,  for  example  tu  correspond¬ 
ing  to  the  product  y[2]y[3],  or  t24  corresponding  to 
the  product  y[3]j/[4]. 

B)  f-coefficients  that  correspond  to  a  weight  two  product 
of  samples  separated  by  a  distance  of  an  even  num¬ 
ber  of  samples.  The  set  B  is  exemplified  by  fg  corre¬ 
sponding  to  the  binary  product  y[0]y[3],  or  f33  corre¬ 
sponding  to  y  [0]y  [5]. 

The  coefficient  to  is  included  in  both  sets.  Neighboring 
samples  are  separated  by  a  zero  distance,  hence  set  A  is 
a  subset  of  B.  Using  one  of  the  sets  A  or  B  we  can  form 
an  approximation  of  each  entry  in  the  true  g  and  build  a  ta¬ 
ble  estimate  g.  By  calculating  an  entry  estimate  only  when 
needed,  fewer  coefficients  need  to  be  stored.  Accordingly, 
the  memory  complexity  is  reduced  from  storing  the  entire 
table  with  2N  coefficients  to  N  or  N2/4  +  1  using  set  A 
or  B,  respectively.  That  is,  a  reduction  from  an  exponen¬ 
tial  to  a  polynomial  relation  between  the  block  length  and 
the  number  of  coefficients.  A  block  diagram  of  a  type-,4 
estimator  is  given  in  Figure  2. 

4.  ESTIMATOR  DESIGN 

In  was  shown  above  how  to  form  an  approximation  of  each 
table  entry  using  a  reduced  set  of  f-coefficients.  Calculating 
the  entire  set  of  f-coefficients  requires  storage  of  the  full  ta¬ 
ble  g.  This  is  not  feasible  for,  say,  N  >  26.  The  structure 


of  the  approximate  estimator  is,  however,  independent  of  N. 
Here,  we  use  the  structure  of  such  an  estimator  and  calcu¬ 
late  the  corresponding  reduced  set  of  coefficients  under  the 
MMSE  criterion. 

Let  h^(i)  and  hg(i)  denote  vectors  containing  the  sig¬ 
nal  products  in  (10)  corresponding  to  the  f-coefficients  in 
the  sets  A  and  B,  respectively.  That  is, 


iu(i)  = 


1 

y[o}y[i] 
y\ i]y[2] 

,  hg(i)  = 

1 

y[o\y{  i] 

y[  i]y[2] 

j/[0]y[3] 

L  J 

(12) 


where  the  relation  between  the  index  i  and  the  sequence 
y[n]  is  given  by  (5).  We  denote  the  corresponding  vectors 
with  f-coefficients  by  and  tg,  respectively.  We  can  now 
formulate  two  corresponding  frequency  estimators  as 

fA(i)  =  t\hA(i),  fB(i)  =  tghg(i).  (13) 


In  order  to  optimize  the  performance  of  the  estimators 
in  (13)  let  U  be  a  design  parameter  to  be  chosen  optimally. 
Using  the  MMSE  criterion  t &  is  given  by 

t k  =  arg  min  E(/  -  aThfc(i))2 

a 

=(E[hfc(f)hfc(i)T])-1E[h  k(i)f]  k  =  A,  B  (14) 


where  the  expectation  is  with  respect  to  frequency  /,  phase 
<j>  and  noise  e[n]. 

A  feasible  approach  to  calculate  the  expectations  needed 
in  ( 14)  is  by  aid  of  Monte  Carlo  integration.  Such  a  training 
procedure  for  the  problem  at  hand  is  discussed  in  [1], 


5.  NUMERICAL  EVALUATION 

In  Figure  3,  the  empirical  mean  square  error  (MSE)  is  shown 
as  function  of  SNR  for  a  data  record  of  length  N  =  16.  The 
performance  of  the  estimator  using  the  full  table  g  in  (11)  is 
compared  with  using  subsets  of  parameters,  that  is  type-A 
and  typ t-B  in  (13),  respectively.  As  reference,  the  asymp¬ 
totic  (AT  — >  oo)  CRB  for  the  given  signal  model  is  included 
[2].  The  table  g  in  (1 1)  is  obtained  using  a  training  approach 
discussed  in  [1],  The  f-coefficients  t_4  and  tg  for  /a(})  and 
/g(i)  are  calculated  according  to  (14)  using  Monte  Carlo 
integration  at  SNR  =  20  dB.  The  a  priori  distribution  of  / 
is  chosen  as  a  uniform  distribution  on  the  interval  [e,  0.5— e] 
where  e  is  a  design  parameter  and  has  been  set  to  e  =  0.04. 
Our  experience  indicates  that  a  smaller  value  of  s  typically 
results  in  a  significant  performance  reduction  while  a  larger 
value  does  not  appear  to  influence  the  performance  nega¬ 
tively.  In  Figure  3  (as  well  as  in  Figure  4),  the  perfor¬ 
mance  is  evaluated  for  the  signal  in  (2)  with  the  true  fre¬ 
quency  /o  =  0.1.  Further  the  MSE  figures  are  averaged 
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Fig.  3.  Performance  of  the  proposed  estimators  for  N  =  16. 
Displayed  are:  CRB  (*),  table  look-up  estimator  (6)  (+), 
estimator  fA(i)  (o)  and  fB(i)  (0). 

over  100.000  independent  trials.  From  Figure  3,  we  note  a 
decreased  performance  when  the  complexity  of  the  estima¬ 
tor  is  reduced.  We  observe  further  that  for  high  SNRs  the 
performance  of  (11)  starts  to  deviate  from  the  CRB  due  to 
a  non-negligible  bias  term  in  the  MSE.  For  /^(i)  and  /b(?') 
the  bias  is  even  more  significant. 

The  experiment  is  repeated  in  Figure  4,  but  now  for 
N  =  64.  In  this  case,  it  is  not  feasible  to  implement  (11)  and 
it  is  therefore  excluded  from  the  comparison.  From  the  fig¬ 
ure,  we  note  that  the  performance  of  /b(?)  almost  coincides 
with  the  asymptotic  CRB  for  all  SNRs  above  a  threshold 
at  about  15  dB.  We  also  note  that  the  difference  in  perfor¬ 
mance  between  f^(i)  and  /s(z)  is  negligible  for  low  SNRs. 
At  high  SNRs  the  difference  is  more  significant. 

In  Figure  5,  the  empirical  MSE  is  shown  as  a  function 
of  the  unknown  signal  frequency  fo  at  a  fixed  SNR  =  20 
dB  and  block  length  N  =  64.  As  a  reference  the  asymp¬ 
totic  CRB  is  displayed.  We  observe  that  both  the  estimators, 
f^(i)  and  /s(t)  performs  well,  except  at  frequencies  near 
0  or  0.5,  and  that  the  difference  in  performance  between 
them  is  negligible.  However,  the  performance  is  dependent 
on  the  unknown  signal  frequency  fo  and  for  some  isolated 
frequencies  the  performance  is  significantly  deteriorated. 

6.  SUMMARY  AND  CONCLUSIONS 

We  have  shown  that  the  table  based  approach  used  as  a 
frequency  estimator  in  [1]  can  be  transformed  using  the 
Hadamard  transform  to  an  equivalent  representation  based 
on  a  sum  over  binary  products  and  a  set  of  coefficients.  We 
have  also  investigated  how  the  set  of  coefficients  can  be  re¬ 
duced,  and  how  such  reduction  makes  it  possible  to  handle 
large  blocks  of  data.  We  furthermore  showed  how  the  re¬ 
maining  coefficients  can  be  optimized  to  increase  the  perfor¬ 
mance  of  the  estimator  with  reduced  complexity.  The  per- 


Fig.  4.  Performance  of  the  proposed  estimators  for  N  =  64. 
Displayed  are:  CRB  (*),  estimator  fa(i)  (o)  and  f^(i)  (0). 


Fig.  5.  Emperical  MSE  as  a  function  of  the  frequency  /  for 
JV  =  64  and  SNR  =  20  dB.  Displayed  are:  CRB  (+),  /4(f) 
(-  -)  and  fB(i)  (-)• 

formance  of  the  new  estimators  was  then  evaluated  by  aid 
of  simulations  and  their  performance  was  compared  with 
the  appropriate  Cramer-Rao  bound.  The  simulations  indi¬ 
cated  that  the  considered  methods  are  able  to  produce  nearly 
statistically  efficient  estimates. 

REFERENCES 

m  t.  Andersson.  M.  Skoglund.  and  P.  Handel,  “Frequency  esti¬ 
mation  by  1-bit  quantization  and  table  look-up  processing,”  in 
Proc.  EUSIPCO ,  Finland,  pp.  1807-1810,  Sep.  2000. 

[2]  A.  Hpst-Madsen  and  P.  Handel.  “Effects  of  sampling  and 
quantization  on  single  tone  frequency  estimation,”  IEEE 
Trans.  Signal  Processing,  Vol.  48.  No.  3.  pp.  650-662,  2000. 

[3]  F.  J.  MacWilliams  and  N.  J.  A.  Sloane,  The  Theory  of  Error- 
Correcting  Codes,  North-Holland.  Amsterdam,  1977. 

[4]  P.  Hedelin,  P.  Knagenhjelm,  and  M.  Skoglund,  “Theory  for 
transmission  of  vector  quantization  data,”  in  Speech  coding 
and  synthesis,  W.  B.  Kleijn  and  K.  K.  Paliwal,  Eds.,  chap¬ 
ter  10,  pp.  347-396.  Elsevier  Science,  1995. 


412 


BEST  QUADRATIC  UNBIASED  ESTIMATOR  (BQUE)  FOR  TIMING  AND 

FREQUENCY  SYNCHRONIZATION 

Javier  Villares  and  Gregori  Vazquez 


Department  of  Signal  Theory  and  Communications,  Polytechnic  University  of  Catalonia 
UPC  Campus  Nord  -  Modul  D5,  c/Jordi  Girona  1-3,  08034  Barcelona  (Spain) 
e-mail :  (j  avi ,  gregori }  @gps .  tsc .  upc.es 


ABSTRACT 

This  paper  deals  with  the  optimal  design  of  quadratic  Non- 
Data-Aided  ( NDA )  open-  and  closed-loop  estimators.  The 
new  approach  supplies  the  minimum  variance,  unbiased  NDA 
quadratic  estimators,  without  the  need  of  assuming  a  given 
statistics  for  the  nuisance  parameters,  that  is,  avoiding  the 
common  adoption  of  the  gaussian  assumption,  which  does 
not  apply  in  digital  communications.  Alternatively,  if  the 
unbiased  constraint  is  relaxed,  a  bayesian  open-loop  esti¬ 
mator  is  presented  and  its  performance  compared  with  the 
open-loop  BQUE  solution.  On  the  other  hand,  the  closed- 
loop  BQUE  is  developed,  showing  that  it  outperforms  the 
well-known  ’ad  hoc’  Gaussian  Stochastic  Maximum  Likeli¬ 
hood  ( GSML )  scheme  for  short  observation  windows,  that 
is,  for  low-complexity  implementations,  only  converging  to 
the  UCRBG  asymptotically.  Finally,  the  quadratic  analy¬ 
sis  is  naturally  extended  to  higher-order  techniques  which 
exhibit  a  better  performance  for  high  SNRs. 

1.  INTRODUCTION 

Non-Data- Aided  (NDA)  synchronization  has  received  lately 
a  lot  of  attention  because  it  simplifies  the  system  proto¬ 
cols  and  makes  unnecessary  the  transmission  of  training  se¬ 
quences  (preambles)  that  reduce  significantly  the  spectral 
efficiency. 

So  far,  most  algorithms  have  been  devised  by  heuristic 
reasoning.  In  [1]  [2]  the  authors  presented  a  general  frame¬ 
work  that  allowed  the  formulation  of  any  NDA  synchronizer 
based  on  second  order  moments  under  a  Maximum  Likeli¬ 
hood  (ML)  perspective.  There  are  two  basic  reasons  for 
limiting  the  analysis  to  quadratic  synchronizers.  It  is  shown 
[1]  that  the  stochastic  ML  solution  becomes  quadratic  for 
low  SNRs,  being  still  unknown  for  moderate  or  high  SNRs 
because  the  difficult  treatment  of  the  unknown  transmitted 
symbols  and,  moreover,  they  allow  efficient  digital  implemen¬ 
tations. 

In  this  paper,  we  propose  a  totally  different  approach 
to  the  design  of  NDA  discriminators  that  does  not  have  to 
cope  with  the  symbols  extraction  problem  as  in  [1],  Mak¬ 
ing  use  of  basic  concepts  from  the  estimation  theory  [3], 
we  have  deduced  the  Best  Quadratic  Unbiased  Estimator 
(BQUE),  that  is,  an  estimator  of  the  desired  parameter  that 
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is  quadratic,  unbiased  and  has  minimum  variance.  Its  name 
has  been  chosen  by  analogy  with  the  classical  Best  Linear 
Unbiased  Estimator  (BLUE)  [3].  The  BQUE  approach  al¬ 
lows  to  unify  the  design  of  open-  and  closed-loop  synchroniz¬ 
ers  by  constraining  the  value  and/or  slope  of  certain  points 
of  the  S-curve  and  optimizing  its  performance  within  the 
expected  operation  range. 

The  structure  of  the  paper  is  the  following.  The  signal 
model  and  the  problem  statement  are  presented  in  Section  2. 
Section  3  introduces  the  algebraic  notation  used  through  the 
paper.  Section  4  deduces  the  exact  solution  to  the  open-loop 
BQUE  and  closed  expressions  are  obtained  with  the  help  of  a 
discrete  approximation.  In  Section  5  the  non-bias  constraint 
is  lifted  and  an  alternative  open-loop  bayesian  discriminator 
having  minimum  mean  squared  error  is  presented.  Section 
6  deduces  the  closed-loop  BQUE  for  tracking.  Section  7 
compares  the  BQUE  and  Gaussian  Stochastic  Maximum- 
Likelihood  (GSML)  feedback  detectors  with  the  Gaussian 
Unconditional  Cramer- Rao  Bound  (UCRBG).  Section  8  ex¬ 
tends  the  results  to  higher-order  estimators.  Simulations  re¬ 
sults  and  their  comments  can  be  found  in  Section  9  and, 
finally,  conclusions  are  drawn  in  Section  10. 

2.  DISCRETE-TIME  SIGNAL  MODEL 

A  lot  of  problems  in  the  signal  processing  field  can  be  unified 
using  the  following  signal  model: 

r  =  Aa  ■  x  +  w  (1) 

where  r  is  a  vector  containing  N  samples  of  the  received  sig¬ 
nal  (Nss  samples  per  symbol),  A  is  the  parameter  to  estimate 
(for  instance,  the  timing  error  or  frequency  error)  embedded 
in  the  transfer  matrix  Aa,  x  is  the  vector  of  transmitted 
symbols  (unknown  in  a  NDA  scheme),  w  is  the  vector  of 
noise  samples  with  covariance  matrix  Rw  =  E  {www}. 

3.  NOTATION 

The  following  notation  is  introduced  here  to  facilitate  the 
deduction  of  the  proposed  BQU  estimators: 

Ra  =  rr11 

(2) 

Aa 

H\  =  uec(R,A)  ;  R\  =  ucc(Ra)  ;  S\  =  vec(S\) 


Ra  —  E  |Ra  J-  —  A\A\h  +  Rw 

Sa  ^  =  DaAah  +  AaDa*  ;  Da  =  ^ 
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where  R  \  is  the  sample  covariance  matrix  and  the  opera¬ 
tor  vec.(M)  stacks  the  columns  of  M  in  the  column  vector 
M.  Using  the  notation  defined  in  the  previous  section,  the 
generic  equation  of  a  quadratic  discriminator  is: 

A  =  r"Mr  =  Tr(MRA)  =  MH  R\  (3) 

where  Tr(-)  is  the  trace  operator,  M  is  the  matrix  containing 
the  discriminator  coefficients  (complex)  and  A4  is  defined  as: 

M  =  vec(  M")  (4) 

4.  OPEN-LOOP  (OL)  BEST  QUADRATIC 
UNBIASED  ESTIMATOR  (OL-BQUE) 

The  minimum  variance  unbiased  ( MVU )  estimator  M  if  the 
received  parameter  is  A  has  this  expression: 

M  =  argmiriME  { | A|2  }  (5) 

subject  to  MhUx  =  A  and,  thus,  the  cost  function  to  mini¬ 
mize  is  the  following: 

C(A)  =  -MhQaA4  +  (mhTZ\  -  a)«a  (6) 

where  Q \  =  E  {7 ZxE-x  }  and  the  Lagrange  multiplier  cix 

impose  the  non-bias  constraint  MnTZx  -  A.  The  fourth- 
order  moments  contained  in  Qa  can  be  computed  as  fol¬ 
lows  for  any  symmetric  constellation  (£{.r" }  =  0  Vi  if  n  is 
odd)  with  uncorrelated  and  identically  distributed  symbols 
(£{s:;.x*}  =  E{x,iXj}  =  0  Vi  ^  j): 

Qx{X!i  +  j.Nk  +  l)  =  E  }  =  R,jRki  +  R,iRkj  + 

+  (/M  -  2p2)  ■  (aiOaj)  (a^oai)"  (7) 

+  (a:  Op)  aJ  (ajop)  af1  -  (aiQa/OPOP)  (akO»i)" 

where  Rjj  is  the  element  (i.j)  of  R\,  a„  the  n-tli  row  of 
Aa,  Ha  =  E{\xn\4}  and  fi2  =  £{|*n|2}  (V?i),  p„  =  E{xi  }  = 
=  E{(x*n)2}  the  n-th  element  of  the  row  vector  p  and  © 
stands  for  the  Hadamard  product  of  matrices. 

Any  statistical  a  priori  knowledge  of  the  parameter  of 
interest  A  €  A  =  {|A|  <  Aa}  can  be  introduced  in  the  op¬ 
timization  process  by  means  of  a  bavesian  approach,  that 
is,  by  averaging  the  cost  function  in  (6)  with  respect  the 
adopted  prior: 

Coi  =  £a{C(A)}  =  JA  C(A)W\  dX  = 

(8) 

=  Mh  (Ja  QaW  a  dX)  M  +  MH  fA  TZxaxdX 

where  ax  =  Wxax  and  all  the  irrelevant  terms  have  been 
wiped  out  from  the  last  equation.  The  weighting  function 
Wx  =  /a  (A)  is  the  prior  of  the  parameter  of  interest.  If  no  a 
priori  knowledge  is  available,  then,  a  uniform  prior  shall  be 
adopted  within  the  operative  range  A,  that  is,  H'a  =  JZJ- 
The  solution  of  (8)  has  the  following  expression: 

M  =  Q_177  (9) 

with  Q  =  fA  QaRa  dX  and  TZ  =  /A  RxaxdX.  The  value  of 
the  multipliers  ax  (VA  e  A)  that  force  the  unbiased  solu¬ 
tion  within  the  whole  interval  A,  requires  the  solution  to  the 
following  system  of  integral  equations: 

MhTZx  =nHQ~1R\  =  [  nUaldu  Q-'llx  =  A  (10) 
Ja 


At  that  point,  we  have  opted  for  a  discrete  approximation  of 
the  integral  in  (10)  considering  only  a  finite  set_of  constraints 
As  =  [Ai ,  •  •  • ,  Xi]T .  Thus,  we  obtain  that  7?.  «  R.o  and 
(10)  becomes  R.f  Q_lRso  =  A.,  (after  some  manipulations) 
incorporating  the  definitions  below: 

R,<  =  [7va  77a  ,.  ]  o  =  [«a,  ,  •  •  ■  .oaJ7  (11) 
The  discrete  approximation  of  the  solution  is  therefore: 

M  =  Q-1  (R,o)  =  Q_1RS  (r"Q_1R.)#  A,  (12) 

It  turns  out  that,  the  matrix  RS/,Q_1RS  in  (12)  can  be  sin¬ 
gular  if  the  oversampling  (A's.,)  and  the  length  of  r  are  not 
large  enough.  In  that  case,  the  set  of  constraints  o-  cannot 
be  exactly  fulfilled  and  the  pseudo-inverse  operator  (-)#  will 
supply  the  least-squares  fitting. 


5.  UNCONSTRAINED  OPEN-LOOP  BAYESIAN 
ESTIMATOR  (OL-BAYES) 

In  this  section  we  present  an  alternative  criterion  to  design 
open-loop  estimators  from  a  bavesian  approach  when  relax¬ 
ing  the  unbiased  constraint.  The  discriminator  coefficients 
M  will  be  those  that  minimize  the  following  cost  function: 

Cu„,  =  Ex  {e\\MhTZ  -  A||2}  =  MhQM  -  M" TZ  (13) 

where  the  last  equivalence  only  conserves  the  terms  depen¬ 
dent  on  Mh  and  where  7 Z  =  (A  TZxWxXdX  and  Q  was  intro¬ 
duced  in  (8). 

The  expression  of  the  discriminator  M  minimizing  (14) 
is  therefore:  _ 

M  =  Q~l1Z  (14) 

and  both  TZ  and  Q  admit  analytical  solutions  for  uniform 
priors,  i.e.,  II'a  = 


6.  CLOSED-LOOP  (CL)  BEST  QUADRATIC 
UNBIASED  ESTIMATOR  (CL-BQUE) 


In  this  section  the  estimator  is  required  to  track  the  param¬ 
eter  fluctuations  around  A=0  with  minimum  variance  for  a 
given  loop  bandwidth,  i.e.,  for  a  given  value  of  the  S-curve 
slope  at  A=0.  The  pretended  BQU  discriminator  of  the  pa¬ 
rameter  errors  will  be  that  minimizing  the  following  cost 
function: 

Cci  =  MhQoM  +  MhU 0  ■  a0  +  (-Mh.So  -  l)  ■  A>  (15) 


where  the  Lagrange  multipliers  00  and  /?o  impose  the  non¬ 
bias  constraints  MHTZo  —  0  and  MH So  =  1  at.  A=0. 

It  can  be  shown  that  the  constraint  00  is  always  fitted 
due  to  the  discriminator  symmetry  and  the  CL  —  BQUE 
solution  reduces  to: 


A4  =  /Jo  ■  Qo  1  So  — 


Qo’So 

-s0"Qo% 


and  its  tracking  error  variance  is  therefore: 


A  S^Qo'So 


(16) 


(17) 
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Figure  1:  Normalized  timing  variance  (a2/T2)  for  the 
GSML  and  CL  —  BQUE  discriminators  N— 4.  The  fourth- 
order  discriminator  proposed  in  reference  [4]  is  also  plotted 
and  compared  with  the  MCRB  [5]. 


7.  CL-BQUE  VS.  MAXIMUM  LIKELIHOOD 
APPROACH 

Other  suitable  approach  to  the  problem  is  to  treat  the  re¬ 
ceived  data  (r)  as  if  they  were  gaussian.  The  resulting  dis¬ 
criminator,  called  Gaussian  Stochastic  Maximum  Likelihood 
(GSML),  has  the  following  structure  in  the  uniparametric 
case  [2]: 

\  _  Tr(RQ  ^ SqRq  xRa)  ,  , 

Tr  [(Rq^So)2] 

This  discriminator  is  known  to  attain  the  Gaussian  Uncon¬ 
ditional  Cramer-Rao  Bound  ( UCRBG )  if  N  — >  oc  [1], 

In  Section  9  simulations  have  shown  that  the  CL— BQUE 
(Section  6)  becomes  asymptotically  equivalent  to  the  GSML 
and,  hence,  both  attain  the  UCRBG  (asymptotically).  How¬ 
ever,  when  the  length  of  r  is  short,  the  variance  of  the 
CL  —  BQUE  is  below  that  of  the  GSML  and  the  UCRBG 
is  not  attained.  This  fact  confirms  the  UCRBG  as  a  suitable 
bound  for  quadratic  (unbiased)  NDA  discriminators  but  it 
also  proves  that  it  can  not  be  attained  if  the  observation 
window  is  not  large  enough.  In  that  case,  the  performance 
of  the  CL  —  BQUE  (17)  can  be  used  as  a  tighter  bound  valid 
for  any  quadratic  unbiased  NDA  discriminator. 

8.  EXTENSION  TO  HIGHER-ORDER 
DISCRIMINATORS 

In  this  section  the  procedure  for  designing  optimal  open-  and 
closed-loop  estimators  is  extended  to  n-order  discriminators 
(with  n>2). 

For  the  n-order  case,  equation  (3)  can  be  rewritten  as 
follows: 

i  =  MflKw  (19) 

where 

K(n)  =  n  >  2  (even)  (20) 


Figure  2:  Normalized  timing  variance  as  a  function  of  N  for 
the  GSML  and  CL  —  BQUE  discriminators  (S6Vo=40dB). 

and  gi  stands  for  the  Kronecker  product  of  matrices. 

Odd  values  of  n  are  not  considered  because  all  modula¬ 
tion  schemes  in  practice  are  symmetric  with  respect  to  the 
origin  and  so  the  odd  moments  are  always  null. 

All  the  expressions  included  in  previous  sections  are  then 
usable  if  the  following  substitutions  are  done  previously: 

ii  — ¥  a<-n) 

Tlx  —>TZ{^  =E[H{n)\ 

->  <^n)  =  &RW  (21) 

Qx  — ►  Q ln)  =  E  |^n)  j 

Generally,  the  complexity  and  the  minor  improvement 
reflected  in  the  system  BER  with  respect  to  quadratic  al¬ 
gorithms,  do  not  justify  the  utilization  of  higher-order  syn¬ 
chronizers  in  communications  systems.  However,  when  the 
purpose  is  not  strictly  synchronization,  but  the  exact  estima¬ 
tion  and/or  tracking  of  the  time  of  arrival  and/or  the  Doppler 
offset  of  the  incoming  signal,  which  is  the  case  of  advanced 
navigation  and  positioning  systems  ( DGPS ,  GNSS ,  etc.), 
they  could  be  taken  into  account  despite  their  complexity.  In 
any  case,  the  higher-order  study  herein  is  valuable  because  it 
supplies  new  bounds  that  give  information  on  the  potential 
improvement  these  techniques  can  yield  (Figure  1). 

9.  SIMULATION  RESULTS 

The  simulations  have  been  carried  out  for  the  MSK  (Min¬ 
imum  Shift  Keying)  modulation  as  a  particular  case  of  the 
binary  Continuous  Phase  Modulations  CPM.  This  transmis¬ 
sion  scheme  is  adopted  because  it  allows  a  simple  extension 
to  any  linear  digital  modulation  and  to  any  multiple  access 
modulation,  as  well  [6].  Recall  that  the  Laurent  expansion 
[7]  [5]  allows  the  formulation  of  binary  CPM  signals  in  terms 
of  the  model  presented  in  Section  2.  The  simulations  have 
been  done  for  additive  white  gaussian  noise  (AWGN)  and 
two  samples  per  symbol  ( Nss  =  2). 
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Figure  3:  Frequency  S-curves  for  the  CL  —  BQUE,  GSML, 
OL  -  BQUE  and  OL  -  BAYES  for  A*  =  0.5  and  N=A. 


-  Figure  1:  the  GSML  and  CL  —  BQUE  are  compared 
with  the  UCRBG  when  the  vector  of  samples  r  is  short 
(JV=4).  It  is  clear  that  in  this  case  the  variance  of  both 
discriminators  is  above  the  UCRBG.  The  discrepancies  are 
more  important  for  high  EbNo  because  the  gaussian  assump¬ 
tion  becomes  more  exact  as  the  EbNo  is  reduced.  Figure  1 
also  shows  how  fourth-order  detectors  [4]  are  capable  to  be 
below  the  UCRBG  and  nearer  the  DA  performance.  It  is 
also  remarkable  that  for  low  SNRs  the  UCRBG  becomes  a 
valid  bound  for  the  variance  of  any  NDA  detector  irrespec¬ 
tive  of  its  order. 

-  Figure  2:  the  asymptotic  convergence  of  the  CL  — 
BQUE  and  the  GSML  is  shown.  The  variance  depicted  in 
the  figure  is  for  an  open-loop  configuration  when  A  ~  0  (also 
Fig.  4).  The  corresponding  closed- loop  tracking  variance  is 
approximately  Lo  =  0.b/BnT  times  lower  if  the  normalized 
loop  bandwidth  ( B„T )  is  very  small  and  To  >  N. 

-  Figure  3  and  4:  the  two  open-loop  schemes  proposed  in 
the  paper  (Sections  4  and  5)  are  compared.  Figure  4  shows 
how  the  OL  —  BAYES  can  reduce  the  mean  squared  error 
within  the  designed  interval  A\  (even  for  a  extremely  high 
EbNo= 40  dB)  because  it  is  not  forced  to  be  unbiased  (see 
figure  3). 

The  behaviour  of  the  studied  closed-loop  discriminators 
(i GSML  and  CL  -  BQUE,  Sec.  6)  for  A  #  0  is  also  de¬ 
picted.  Figures  3  and  4  show  their  specialization  for  A  =  0 
(tracking).  It  is  also  notorious  that  the  CL  —  BQUE  has  a 
better  behaviour  than  the  GSML  outside  the  steady-state 
situation  (A  yf  0). 

10.  CONCLUSIONS 

This  paper  presented  a  new,  versatile  approach  for  design¬ 
ing  both  open-  and  closed-loop  optimal  synchronizers  with 
constraints  on  the  S-curve  shape  (non- bias  restrictions).  If 
a  little  amount  of  bias  is  tolerated,  a  very  simple,  elegant 
bayesian  estimator  was  formulated  in  Sec.  5  which  is  found 
to  reduce  the  mean  squared  error  within  the  designed  range 


Figure  4:  Normalized  mean  square  error  as  a  function  of  A  for 
the  CL  -  BQUE,  GSML,  OL  -  BQUE  and  OL  -  BAYES 
with  A,\=0.5.  A'=16  and  Nss—4. 

of  the  parameter. 

For  the  closed-loop  case,  an  optimal  (unbiased)  param¬ 
eter  error  detector  is  obtained  whose  tracking  variance  is  a 
lower  bound  for  any  NDA  quadratic  unbiased  discriminator 
with  independence  of  the  number  of  samples  it  processes. 
A  comparison  with  the  classical  GSML  is  carried  out  and 
their  asymptotic  convergence  proved  empirically.  However, 
the  BQUE  is  observed  to  outperform  the  GSML  for  short 
data  vectors  and  high  SNRs. 

Finally,  the  formulation  of  the  paper  is  extended  to  higher- 
order  synchronizers  and  their  utilization  discussed. 
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ABSTRACT 

An  efficient  algorithm  for  estimating  the  peak  position  of  a 
sampled  function  is  presented.  The  algorithm  uses  the  Hilbert 
transform  of  the  function  for  peak  detection  via  interpolation. 
The  accuracy  of  the  proposed  method  is  demonstrated  using  an 
example  where  the  frequency  of  a  sinusoid  is  determined  by 
detecting  the  peak  of  the  FFT  of  the  signal.  It  is  shown  that  the 
algorithm  has  computational  advantage  when  the  positions  of 
many  peaks  of  the  sampled  function  are  required  to  be 
estimated,  e.g.  as  in  the  fundamental  and  harmonic  frequency 
estimation  of  an  audio  signal.  Spectral  characteristics  of  the 
Hilbert  transform  amplitude  and  phase  functions  and  the 
rationale  for  the  use  of  Hilbert  transform  for  interpolation  are 
also  discussed  in  detail. 

1.  INTRODUCTION 

Accurate  peak  position  estimation  of  a  sampled  function  is 
necessary  in  many  Digital  Signal  Processing  applications.  For 
example,  precise  time  delay  estimation  in  radar/sonar 
applications  [1],  accurate  signal  frequency  estimation  via  the 
FFT  algorithm  [2],  detection  of  R  wave  in  ECG  signals  for  time 
alignment  [3]  and  the  estimation  of  the  position  of  many  peaks  in 
a  time-frequency  distribution  [4]  are  some  of  the  applications 
where  peak  detection  is  required  in  the  processing.  When  the 
function  is  continuous  in  time,  the  peak  position  can  be  simply 
estimated  using  any  existing  gradient  finding  algorithm.  In 
sampled  signals,  however,  accurate  peak  detection  becomes  a 
computationally  intensive  procedure  as  most  of  the  time  the  exact 
peak  lies  in  between  sample  values  of  the  function.  In  such  cases 
the  peak  position  is  determined  by  various  signal  interpolation 
techniques  which  are  computationally  intensive.  For  example, 
frequency  estimation  using  FFT  algorithm  requires  many  DFT 
calculations  before  sufficiently  accurate  frequency  estimate  could 
be  obtained  [2]. 

This  paper  proposes  a  computationally  efficient  algorithm  for  the 
peak  detection  and  position  estimation  of  a  sampled  function. 
The  algorithm  is  based  on  a  novel  signal  interpolation  technique: 
The  technique  relies  on  the  Hilbert  Transform  (HT)  of  the 
sampled  signal  which  can  be  efficiently  used  to  interpolate  the 
signal  in  between  samples.  (The  HT  interpolation  technique  has 
been  successfully  used  in  a  fractional  sampling  application  in  an 
array  processing  example  in  reference  [5]).  The  accuracy  of  the 
HT  based  signal  interpolation  technique  as  well  as  the 
performance  of  the  peak  position  estimation  algorithm  are 
discussed  in  the  following  sections. 


2.  HILBERT  TRANSFORM  INTERPOLATION 

Let  the  sequence  x(k),  (k  £  Z) ,  has  been  obtained  by 
uniformly  sampling  a  real  function  x(t)  ,  at  sampling  intervals  of 
T ,  i.e.  x (k)  =  x(l  )\t=j  T  .  Consider  the  problem  of  estimating  the 
signal  value  x(t)  at  some  time  t  given  by  t  =  kT  +  eT  where 
0  <  £  <  1  using  the  sequence  x(k). 

Suppose  z(k)  is  the  analytic  signal  associated  with  x(k)  ,  i.e. 

Z(k)  =  x(k)  +  jH{x(k)}  ,  (1) 

where  Hj.j  denotes  the  Hilbert  transform  (HT).  Using  equation 
(1)  the  amplitude  and  phase  of  the  analytic  signal  can  be 
respectively  obtained  as, 

A(A)  =  |-(A)|  ;  <P(k)  =  arg(z(k))  (2) 

The  following  points  are  noted: 

1 .  In  some  applications,  e.g.  in  radar/sonar  and  also  in  digital 
communications,  the  Hilbert  transform  of  the  signal  is 
available  at  the  receiver  without  the  need  of  additional 
processing.  This  is  because  of  the  quadrature  demodulation 
at  the  receiver. 

2.  In  the  absence  of  noise,  the  functions  A(k)  and  tp(k) 
obtained  via  the  Hilbert  transform  operation,  are  both 
slowly  varying.  (See  Appendix  for  a  detailed  discussion.) 

As,  A(k)  and  (j>(k )  are  slow  varying  it  is  possible  to  linearly 
interpolate  them  to  obtain  an  estimate  of  the  analytic  signal  at 
time  t  =  kT  +  eT .  That  is  z(k  +  £)  can  be  derived  using  the 
following  relations: 

|z(A+£)|=£A(£  +  l)  +  (l-£)A(/c)  ;  (3) 

arg(z(A.  +£))  =  £0(/c+l)  +  (l-£)0(/c) .  (4) 

Once  z(k+£)  is  known.  x(kT  +  eT)  then  results  from  the  real 
part  of  z(k  +  £)  . 

Table  1  shows  the  results  of  an  experiment  performed  to 
determine  the  accuracy  of  the  Hilbert  transform  interpolation 
technique.  The  following  Linear  Frequency  modulated  signal 
having  a  Gaussian  shaped  envelope  was  used  in  the  experiment. 

x(t)  =  e-10'  cosdnat  +  nfit 2 )  ,  (5) 

with  a  =  240 Hz  and  P  =  l20Hz/s.  The  signal  duration  was 
selected  as  —  1  <  t  <  +1  seconds.  At  first  a  sequence  x(k)  was 
obtained  by  sampling  the  signal  in  equation  (5)  by  a  1  kHz 
sampling  frequency.  Note  that  the  signal  in  equation  (5)  occupies 
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the  full  Nyquist  bandwidth  ±500 Hz  .  Suppose  another  sequence 
is  defined  as  the  values  obtained  by  sampling  the  signal  x(t)  at  a 
frequency  of  d  kHz  (  d  e  9t ),  i.e. 

xjk)  =  x(t)\i  d  .  (6) 


Value  of  d 

Maximum  Error 
from  H.T. 
Interpolation 

Maximum  Error  from 
Linear  Interpolation 

0.092 

7.2403  xKT6 

4.3975 

0.320 

7.2402X10-6 

4.3975 

0.900 

7.2413X10"6 

4.3989 

1.100 

7.2370X10-6 

4.3931 

3.900 

10.454x  10-6 

3.9211 

13.70 

1 1.990xl0~6 

4.9335 

TABLE  1:  Comparison  of  Hilbert  Transform  Interpolation  with 
Lineal'  Interpolation. 

(Note  that  the  above  corresponds  to  a  sampling  rate  conversion 
of  the  original  signal.)  We  can  estimate  the  sequence  xd(k) 
from  the  sequence  x(k)  via  the  HT  interpolation  technique.  The 
exact  value  of  xd(k)  can  also  be  obtained  using  equation  (5). 

Therefore,  it  is  possible  to  calculate  estimation  error,  and  thus 
evaluate  the  performance  of  the  HT  interpolation  algorithm. 
Table  1  shows  the  performance  results  of  the  Hilbert  transform 
interpolation  technique  in  obtaining  the  sequence  xd(k)  from 
sequence  x(k) .  For  comparison  purposes  results  from  a  Linear 
Interpolation  algorithm  is  also  shown  in  Table  1.  Results  from 
Table  1  demonstrate  that  sequence  values  x(k)  could  be 

accurately  estimated  (within  an  error  of  10  5 )  using  the 
described  Hilbert  transform  interpolation  technique, 

3.  PEAK  DETECTION  VIA  THE  HILBERT 
TARNSFORM 

What  is  required  here  is  to  estimate  the  peak  position  of  the 
function  x(t)  using  the  sampled  sequence  x(k) .  The  first  step 
is  a  coarse  estimate;  to  determine  the  sampled  interval  where  the 
peak  of  the  function  x(t)  would  be  located.  This  could  easily  be 
performed  by  detecting  the  peak  value  of  the  sequence  x(k)  and 
then  investigating  the  right  and  left  neighbor  samples  of  the  peak 
sample. 

Suppose  m  and  ( m  + 1)  denote  the  interval  resulting  from  the 
coarse  estimate.  Using  the  relations  (3)  and  (4),  the  HT 
interpolated  function  x(t)  within  this  interval  can  be  obtained  as, 
x(mT  +  sT)  =  {eA(m  + 1)  +  (1  -e)A(m)} 

(7) 

x  cos(£0(«  + 1)  +  (1  -  £)<p(m)) 

The  value  of  £,  which  maximizes  x(mT  +  eT)  can  be 
determined  by  differentiating  the  right  hand  side  of  equation  (7) 
with  respect  to  £  and  equating  it  to  zero.  That  is  to  obtain 


{£4(/;;  + 1)  +  (1  -  £)A(;;;)}  {<j)(m  + 1  )-<p(m)} 

xsin(£0(;;;  +  l)  +  (l-£)^(;;;))  =  (8) 

{ A( ;;;)  -  A(m  + 1 ) )  cos (£<p(m  + 1)  +  (1  -  £  )<p(m)) 


Suppose  y  is  defined  as 

y  =  (A(z;;  +  1)-  A(m))/A(m) 

then  equation  (8)  can  be  expressed  as 


£  =- 


1 


-tan 


(0(;;t  +  l)-0(;;;)) 

(pint) 


-7 


y  l  +  £y 


(9) 


(10) 


(<p(m  +  l)-<p(m)) 

It  is  then  possible  to  numerically  solve  equation  (10)  for  the 
value  of  £  .  However,  as  0  <  £  <  1  and  y  «  1  (see  appendix), 
1  +  £y  =  1 .  Therefore,  equation  (8)  simplifies  to  yield  a  direct 
solution  for  £  as 


£  <ft(;»)  +  tan~'  (y)  (U 

0  <p(m)-<f>(m  +  l) 

And  thus  the  peak  position  of  the  function  x(t)  can  be  obtained 
as  t  =  mT  +  e0T  using  the  amplitude  A(k)  and  phase  <p(k)  of 
the  analytic  signal. 


4.  AN  EXAMPLE  OF  FREQUENCY  ESTIMATION 
USING  THE  ITT 

To  demonstrate  the  accuracy  of  the  above  peak  position 
estimation  algorithm,  consider  the  following  example.  Suppose  it 
is  required  to  estimate  the  frequency  f0  of  a  noisy  sinusoid 
signal  given  by, 

S(p)  =  ei2*°p +v(p)/o2  0<  p<N -\  (12) 

where  v(p)  is  an  independent  identical  distributed  complex 

white  Gaussian  noise  sequence  with  unit  variance;  (T  is  the 
signal  to  ratio  (SNR)  associated  with  the  signal.  The  maximum 
likelihood  (ML)  method  of  frequency  estimation  is  to  compute 
the  Discrete  Fourier  Transform  (DFT)  of  the  signal  and 
determine  the  frequency  where  the  absolute  value  of  the  DFT  is  a 
maximum.  As  this  is  an  extremely  computationally  intensive 
procedure,  conventional  algorithms  works  on  the  following 
method  [2], 


STEP  1 :  Use  an  N  point  FFT  of  the  sampled  signal  as 


x(k)  = 


r=o 


N 


0<k<N-\ 


(13) 


to  determine  a  coarse  frequency  estimate,  i.e.  determine  the  peak 
position  interval.  An  indication  of  the  computational  load  in  this 
step  can  be  obtained  by  looking  into  number  of  required 
multiplication  operations,  which  is  N  log,  N  . 


STEP  2:  Once  the  coarse  estimate  is  obtained,  a  fine  frequency 
estimate  is  obtained  by  evaluating  a  large  number  of  DFTs  with 
in  the  estimated  coarse  interval.  Suppose  the  required  frequency 
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estimation  accuracy  is  facc  the  multiplication  operations 
necessary  to  achieve  this  can  be  obtained  as  N  log ,  (l/Nfacc )  . 

Note  that  STEP  2  is  computationally  intensive  in  comparison  to 
STEP  1.  The  computation  speed  can  be  greatly  increased  by 
using  the  HT  interpolation  on  the  sequence  x(k)  defined  in 
equation  (13),  followed  by  the  detection  of  the  peak  position. 


SNRindB 

Figure  1:  The  mean  square  frequency  estimation  error  versus  the 
SNR,  for  different  values  of  N.  The  value  of  N  increases  from 
top  plot  to  bottom  plot  as  8  ,16,  32,  64,  128,  256,  512.  The  HT 
interpolation  method  is  used  for  estimating  the  Peak.  (Dashed 
lines  are  the  Cramer-Rao  lower  bounds.) 

Figure  1  shows  the  frequency  estimation  results  obtained  from 
such  an  HT  interpolation  method.  Results  of  Figure  1  are 
identical  to  the  results  reported  in  [2].  The  results  of  frequency 
estimation  in  Figure  1  also  achieves  the  Cramer-Rao  bound  given 

by, 

CRB(snr,  N )  =  J— - - -  ( 14) 

\2;r2(V((V  +  l)(fV-l)w 

which  is  the  theoretically  possible  achievable  minimum  error  [2]. 
Note  that  in  equation  (3)  it  was  assumed  that  A(k)  is  slow 
varying.  In  the  presence  of  noise,  to  satisfy  this  condition,  x(k) 
in  equation  (13)  was  obtained  using  an  FFT  (a  real  valued)  of 
length  AN .  Two  Hilbert  transforms  at  the  end  points  of  the 
coarse  interval  was  then  evaluated.  The  number  of  multiplication 
operations  required  for  the  two  HT  operation  is  8Ar .  (It  is 
assumed  here  that  the  HT  can  be  calculated  with  AN 
multiplications.) 

5.  A  COMPARISON  OF  COMPUTATIONAL 
EFFICIENCY 

From  the  discussion  in  the  previous  section,  the  number  of 
multiplication  operations  required  for  the  peak  position  using  the 
HT  interpolation  method  can  be  deduced  as  : 

QHT=Nlog2(AN)  +  SN  (15) 

Using  the  Cramer-Rao  bound  in  equation  (14)  as  the  required 
frequency  accuracy,  facc ,  the  number  of  multiplication 
operations  required  in  the  method  using  DFTs  can  be  determined 
to  be: 


Qdft  =Nlog2(N)  +  Nlog2(j27T  °V--^Af  1)W)U6) 

Calculating  the  number  of  multiplication  operations  in  equations 
(15)  and  (16)  for  various  values  of  N  and  SNR,  it  can  be  shown 
that  the  computational  load  of  the  proposed  HT  method  is 
comparable  to  the  DFT  method  in  estimating  the  frequency  of  the 
sinusoid.  However,  the  proposed  HT  method  does  not  require  the 
evaluation  of  many  DFTs  and  is  extremely  efficient  when  a  large 
number  of  peaks  are  required  to  be  estimated,  such  as  in  a 
periodogram  (time-frequency)  analysis  [4].  This  is  because  the 
computational  load  in  the  proposed  method  is  independent  of  the 
number  of  peaks  in  the  estimate:  The  amount  of  computations  in 
the  DFT  method  is  proportional  to  the  number  of  peaks  that  are 
necessary  to  be  estimated. 

6.  CONCLUSION 

A  technique  to  estimate  the  peak  position  of  a  sampled  signal  is 
proposed.  The  technique  is  based  on  the  HT  of  the  sampled 
signal,  which  can  be  efficiently  used  to  interpolate  the  signal 
within  sampled  points.  The  proposed  technique  can  be  used  in 
many  engineering  applications  to  reduce  the  computational  load 
of  algorithms.  In  a  frequency  estimation  example  it  has  been 
shown  that  the  proposed  method  can  reduce  the  computational 
load  by  a  significant  factor,  especially  when  the  number  of  peak 
positions  that  are  required  to  be  estimated  is  large. 

7.  APPENDIX:  SPECTRAL  CHARACTERISTICS  OF 
FUNCTIONS  A(k)  AND  0(A)  ASSOCIATED  WITH 
THE  ANALYTIC  SIGNAL 

In  section  2,  the  signal  x(k)  has  been  represented  using  an 
analytic  signal  derived  via  the  Hilbert  transform.  Spectral 
characteristics  of  the  amplitude  A(k)  and  phase  0(A)  of  the 
analytic  signal,  in  such  a  representation  is  provided  in  this 
appendix.  The  following  discussion  also  provides  a  rationale  for 
the  selection  of  Hilbert  transform  for  the  interpolation. 

Al.  Frequency  Support  of  Aik)  and  <b(k) : 

From  equations  (1)  and  (2)  we  get 
x(k)+  jH{x(k)}=  A(A)cos(0(A))  +  jA(k) sin(0(A))  (al) 
This  requires  that 

H  { A(  k)  cos  (0(A)) }  =  A(A)  sin(0(A))  (a2) 

To  determine  the  necessary  conditions  for  equation  (a2)  to  be 
satisfied,  consider  H{A(k)B(k)} ,  where  B(k)  =  cos(0(A))  . 
Using  the  convolution  relation  of  the  Discrete  Time  Fourier 
Transform  (D  IET)  we  get 

U2T  1/2  T 

H{A(k)B(k)}  =  )  |  jSA  (0)SB  (/  -0X0  sgn (f)e}2^Tdf 

-1/27- -1/27 

(a3) 

where  SA(<j> )  and  SB(f)  denote  the  DTFT  of  the  signals 
A(A)  and  B(k) ,  respectively;  sgn()  is  the  sign  function. 
Suppose  SA  (0 )  and  SB(f)  are  such  that,  in  the  frequency 
support  regions  of  the  product  5 A(0)S B(f)  and  sgn (/  +0)  , 
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sgn(/  +  0)  can  be  expressed  independent  of  the  argument  < p  . 
i.e., 

SA(<t>)SB(f)sgn(f  +(j>)  =  SA«f>)SB(f)Iif)  (a4) 

then  it  can  be  shown  that 

tf{A(fc)B(A)}  =  A(*)tf{£(*)}  (a5) 

That  is, 

#{A(*)costf(*))}  =  {cos(0(A-))l  (a6) 

The  condition  in  equation  (a4)  requires  that  functions  SA(f) 
and  SB(f)  are  of  low-pass  and  high-pass  (but  limited  to  the 
half  sampling  frequency)  type,  respectively,  and  that  their  spectra 
do  not  over-lap  [6][7],  In  other  words,  S A(f )  and  SB(  f)  are 
such  that, 

SA(f)  =  0  for  \f\>u 
SB(f)=  0  for  V<|/|<1'+ 
where  0  <u<v~  and  v+  <  fs/2  =  l/2  T  . 

A2.  Conditions  on  the  Variation  of  the  Amplitude 
Function  A(k) : 


Since  A(k)  is  band-limited  to  ±u  ,  as  shown  in  equation  (a7), 
via  the  use  of  DTFT  the  following  expression  can  be  obtained. 

A(k)-A(k-1)=  "jSA(f)[l-e-i2^)eJ^Tdf  (a8) 

-II 

Using  Schwartz  equality  the  following  results: 


y= 


A(k)-  A(k-l) 


A(k) 


U 

J2sin(7t/7V/ 


<  4  K 


V 


(a9) 


As  u«fj  2,  Aik) ,  is  a  slowly  varying  function  and 
therefore  a  linear  interpolation  can  be  performed  to  estimate  the 
signal  A(k)  between  sampling  points. 


A3.  Conditions  on  the  Variation  of  Phase  Function  dik): 


From  equations  (a2)  and  (a6)  we  get, 

//{cos(0(k))}  =  sin(0(Ar))  (alO) 

If  the  condition  in  equation  (alO)  is  satisfied  then  the  signal 
B(k)  =  COS (<j>(k))  can  be  obtained  as  the  real  part  of  an  analytic 

signal  c(k)  =  e'mk) .  Note  that  the  instantaneous  frequency  of 
the  signal  c(k)  is  given  by  [8], 

(<j)(k  + 1)  -  <f>(k))/2nT  (all) 

Combining  equations  (a7)  and  (all)  the  following  can  be 
obtained. 

2 nv~/fs  <<p(k  +  [)-(j)(k)<2m’+/fs  (a  12) 

The  phase  function  <p(k) ,  therefore,  is  monotonic  and  has 
a  positive  slope.  Using  (al2)  a  condition  for  the  variation 
of  the  second  difference  of  <j>(k)  can  be  obtained  as, 

\<p{k  + 1)  -  2<t>(k)  +  <p(k  - 1)|  <  2tt(v+  -  v~)/fs  (a  13) 


As  the  accuracy  of  linear  interpolation  depends  on  the  deviation 
of  the  function  from  linearity,  it  can  be  noted  from  (al 3)  that  if 

(y+  —  v  )« /,/2 ,  the  phase  (pik)  can  be  accurately 
interpolated  linearly. 

A4.  Total  Signal  Bandwidth  and  Rationale  for  Interpolating 
using  Functions  Aik)  and  (pik): 

As  x(k)  =  A(k)Bik) ,  the  high  frequency  extent  of  the  signal 

xik)  is  given  by  fH  =  v+  +  2 u  .  (A  more  rigorous  proof  of  this 
via  the  concepts  of  instantaneous  frequency  and  instantaneous 
bandwidth  is  provided  in  reference  [9].)  Therefore,  to  avoid 
aliasing  in  the  sampling  process,  it  is  also  necessary  that 

v+ +  2t<  < /s/2  .  Note  that  the  signal  xik)  can  be  directly 

linearly  interpolated  provided  that  fH  =  v+  +  2u  «  fj2  . 
Where  as  the  conditions  for  HT  interpolation  are  such  that  (i) 
u«fj  2  (for  interpolating  A(k))  and  (ii) 

(v+ —  \r)«fj2  (for  interpolating  <j){k) ).  As  noted  in 

equation  (a7)  since  0<u<\r  <  v+  <  fH  <  fj 2.  it  is  clear 
from  the  discussion  that  the  conditions  for  HT  interpolation  are 
far  less  stringent  than  the  conditions  for  a  linear  interpolation. 
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ABSTRACT 

Recursive,  and  efficient  estimation  of  polynomial-phase 
is  considered  here,  with  alternatives  to  the  standard 
Gauss-Newton  approach  presented.  We  consider  ap¬ 
proximations  of  the  likelihood  and  phase  noise  distri¬ 
bution  to  derive  recursive  approximate  maximum  like¬ 
lihood  and  Bayesian  estimators.  Monte  Carlo  simula¬ 
tions  indicate  that  these  methods  compare  favourably 
with  the  Gauss-Newton  scheme  both  in  terms  of  com¬ 
putational  expense  and  efficiency  thresholds. 

1.  INTRODUCTION 

Estimating  the  coefficients  of  a  polynomial-phase  signal 
in  the  presence  of  noise  has  arisen  from  many  applica¬ 
tions  in  signal  processing  [3]  [4]  [9] .  Assume  that  we  have 
observations 

zt  =  Ae^+gt,  (1) 

where  <f>t  =  J2b=o  Oktk,t  =  0, 1, . . . , T  —  1,  A  is  a  real¬ 
valued  constant,  0  =  [60,...,8K]'  is  the  unknown  pa¬ 
rameter  vector  to  be  estimated,  and  {r/,}  is  a  complex 
white  normal  sequence  with  mean  zero  and  variance 
a2. 

Maximum  likelihood  estimation  of  these  parame¬ 
ters  is  difficult  due  to  the  many  local  maxima  in  the 
likelihood,  and  the  subsequent  computational  expense. 
Recently,  [1]  has  proposed  a  nonlinear  least  squares 
estimation  scheme  that  improves  the  numerical  prop¬ 
erties  significantly,  however  the  computational  burden 
remains  high. 


Fast  methods  for  polynomial-phase  estimation  are 
concerned  with  phase  unwrapping  followed  by  regres¬ 
sion,  as  in  [7]  [11],  or  differencing  in  phase,  as  in  [3]  [9]. 
A  method  to  obtain  efficient  and  direct  estimation  of 
polynomial  phase  signals  in  real  time  is  still  an  open 
problem.  However,  if  we  have  initial  estimates  of  the 
parameters  that  are  within  a  certain  accuracy,  but  do 
not  attain  the  Cramer-Rao  Bound  (CRB),  we  could  ob¬ 
tain  efficient  estimation  based  on  these  initial  values. 
It  has  long  been  recognised  that  inefficient  estimation 
can  be  improved  by  a  single  step  of  an  iterative  process 
that  leads  to  fully  efficient  estimates;  see  [2],  section 
9.2.  Such  procedures  have  been  reported  in  frequency 
estimation  [5]  [10],  and  a  similar  idea  has  been  intro¬ 
duced  for  polynomial  phase  estimation  [4], 

The  standard  approach  to  iterative  refinement  is 
the  use  of  the  Gauss-Newton  method.  The  main  ad¬ 
vantage  is  the  locally  quadratic  convergence  to  the  so¬ 
lution,  however  if  the  initialisation  is  not  sufficiently 
accurate,  convergence  may  be  towards  a  local  minimum 
or  saddle  point.  Techniques  exist  to  improve  the  esti¬ 
mation  accuracy,  however  the  majority  of  these  involve 
line  searching.  The  computational  expense  relating  to 
the  inversion  of  the  Hessian  matrix  at  each  step  is  also 
a  disadvantage. 

In  this  paper,  we  consider  alternative  iterative  re¬ 
finement  techniques  for  polynomial  phase  estimation. 
Firstly,  a  2nd -order  Taylor  approximation  of  the  like¬ 
lihood  equations  is  used  to  derive  a  recursive  scheme 
in  section  2.  We  then  consider  Bayesian  approaches 
to  recursive  estimation,  where  we  propose  a  2nd— order 
approximation  of  the  likelihood  function,  considered  in 
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terms  of  the  phase  angle,  to  produce  a  Gaussian  den¬ 
sity.  This  is  derived  in  section  3.  Monte  Carlo  sim¬ 
ulations  were  then  computed,  and  the  results  shown 
in  section  4.  Here  it  is  shown  that  the  Bayesian  ap¬ 
proach  yields  the  best  performance  as  far  as  attaining 
the  Cramer-Rao  bound  is  concerned,  while  the  approx¬ 
imate  maximum  likelihood  scheme  is  the  most  compu¬ 
tationally  efficient. 


parameter  vector  0,  which  can  be  solved  to  yield  the 
improved  estimate.  We  of  course  do  not  know  the  se¬ 
quence  {x(}  precisely,  so  we  use  an  algorithm  that  uses 
this  solution  method  but  proceeds  recursively  through 
the  sample.  At  each  sample  step  t ,  if  we  have  a  reason¬ 
able  estimate  Qt  of  0,  with  <jh  =  EjsLo^*’  t^C!n 


P 


<h  ~Vi 
2tt 


>  1 


Vt 


©(n) 


(6) 


2.  LINEARISING  THE  LIKELIHOOD 
EQUATIONS 

For  the  signal  model  given  in  (1),  the  negative  log- 
likelihood  (up  to  additive  and  multiplicative  constants) 
is 

T- 1 


J  =  ]T  \zt  -  A  exp  (?:</>,)  I2 
t= o 
T- 1 

=  ^  {p?  +  A2  -  2 Apt  cos  (yt 


<M}> 


4= o 


where  yt  is  the  wrapped  phase  of  Zt ,  and  pt  —  \zt\- 
This  is  of  course  also  the  cost  function  that  would  be 
minimised  in  the  nonlinear  least  squares  approach  to 
estimation.  Using  this  representation,  it  is  clear  that 
we  need  only  minimise  the  expression 


T  — 1 


J  =  J(d)  =  ^pt  cos ( yt  -  <{>t) 


(2) 


4=0 


with  respect  to  0(,  and  hence  0.  The  partial  derivatives 
of  (2)  are,  for  each  j  =  0, 1, ...,  K, 


dJ 

OOj 


T- 1 


=  ^2  Ptf3  -  fa) 

4=0 

X-l 

=  ^2 pttj  sin  +  2nxt  ~ 

4=0 

T— 1 

«  Z>*  (yt+2TTXt-<pt). 


4=0 

where  the  final  line  is  a  2"rf— order  Taylor  approxima¬ 
tion  about  zero,  and  we  have  introduced  the  integer 
process  {xt}  to  account  for  the  27r  phase  ambiguity. 
This  approximation  is  accurate  only  if  we  have  esti¬ 
mated  xt  correctly  for  each  t.  Setting  (5)  to  zero  for 
j  =  0, 1, . . . ,  K  yields  a  set  of  linear  equations  in  the 


when  <  2tt,  and  where  [a]  is  the  integer  such  that 
| a  —  [a] |  achieves  the  minimum.  The  criterion  men¬ 
tioned  above  corresponds  to  a  signal-to-noise  ratio  of 
approximately  —  lOdB.  We  then  estimate  xt  via 


Xt  =  Xf  (i/(,0()  = 


<f>t  ~  Vt 


2? r 


(7) 


and  hence  solve  the  equations  PtQ  =  bt  where, 


Pt  = 


bt  - 


E  Pn  E  nPn 

E  nPn  E  n2pn 

LE  nKpn  ZnK+1P„ 
E  Pt>  ( Vn  "f~  27 rxn) 
YLnPn  {Vn  +  2tTX„) 

EnKPn  {Vn  +  27TXn)  J 


En  Pn 
E  nK+lpu 


E n2KPn  J 


where  the  summations  are  for  n  =  0,1,..., t.  Using 
ht  =  [0,  f, . . . ,  tK] ' ,  we  obtain 


Pt  —  Pnhnhn 

(8) 

(3) 

n= 0 

=  Pt-i  +  pthth[ 

(9) 

(4) 

which,  after  inverting  using  the  standard  identity  [8] 

yields 

(5) 

o— i  o-i  PtPt-i^th'iPt-i 

/"i  n\ 

1  +PtKpt-\hi 


(10) 


We  then  have  the  following  algorithm  to  implement  the 
recursive  approximate  maximum  likelihood  estimator. 

a  Initialise  0  and  calculate  Pk  and  bx- 

b  for  t=K+l,...,T 
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(i)  calculate  Pt  1  from  (10)  and 
bt  =  bt- 1  +  pt(yt  +  2  irxt)ht 

(ii)  calculate  Qt  =  Pp 1  bt 

c  end 

Steps  (i)  and  (ii)  above  can  be  calculated  in  parallel 
(with  (ii)  delayed)  for  greater  speed. 


the  Gaussian  sum  into  a  single  Gaussian  pdf.  The  up¬ 
dated  mean  and  variance  estimates  can  then  be  shown 
to  be 

=  (12) 

m 

Sf  =  Wm  (Om,t-®t)  (@m,t  —  @t)  (13) 


3.  GAUSSIAN  APPROXIMATION  BY 
2jvd-ORDER  TAYLOR  EXPANSION 


A  2nd— order  Taylor  approximation  of  the  likelihood  for 
the  tth  observation,  similarly  to  [6],  yields 


p{yt  |0)  oc  exp 


PtMvt  +  27 TXt 


We  again  estimate  x <  by  (7),  however  to  robustify  this 
estimation,  and  unlike  the  scheme  proposed  in  section 
2,  we  search  the  integers  in  the  neighborhood  of  the 
best  estimate.  Define 


G(X;p,  E) 


exp(-l(X-p)'I-1(X-p)) 

^27rdet(E) 


Using  m  =  it  -  l,xt,xt  +  1,  Bayesian  formulae  yield 
the  recursion 


p(®\yt,-,yo) 


p(yt\Q)p(9\yt-i,...,yo) 

J  p(.yt\@)p{®\yt-i,~,yo)d© 


a  Y.G{K©  ;yt  +  27rm,5t)G(0;0i_i,E<_i) 

m 


=  £wraG(0;0ra,t,Et)  (11) 

m 


where,  using  the  lemma  in  [6], 


c  -  a 
—  ~A — 
Apt 

%  =  — 
© m,t  ~  ©i-l  + 
Wm  oc  exp 


{St  +  h't'£t-iht) 

~jj~(yt  +  27rm  _  ht©t- i) 
(yt  +  2nm  -  h'tOt-i)2\ 

2  (St  +  h'tXt-iht)  J 


and  Wm  =  1.  The  number  of  Gaussian  components 
in  (11)  will  increase  exponentially.  To  overcome  this, 
the  maximum  entropy  criterion  [6]  is  used  to  combine 


4.  SIMULATIONS 

We  considered  the  computational  and  statistical  effi¬ 
ciency  of  the  above  estimators  and  compared  them  with 
the  Gauss-Newton  method.  We  consider  a  constant 
amplitude  chirp  signal  for  this  problem,  with  parameter 
vector  0  =  [1.4, —0.4, 0.03].  Figure  1  gives  an  indica¬ 
tion  of  the  performance  of  these  estimators  for  varying 
initial  accuracy  and  signal-to-noise  ratios.  The  initial 
values  were  chosen  as  Gaussian  random  variables  with 
mean  equal  to  the  true  values  and  variance  chosen  such 
that  the  mean  square  error  of  the  estimators  was  the 
relative  efficiency  prescribed. 


Figure  1:  Performance  of  the  Gauss-Newton  (-),  ap¬ 
proximate  maximum  likelihood  (+-)  and  Bayesian  fil¬ 
tering  (-)  schemes  compared  with  the  Cramer-Rao 
bound  (solid).  1000  simulations  were  run,  with  T  = 
128  and  initial  values  having  accuracy  50%  and  ^p-% 
for  parts  (a)  and  (b)  respectively. 


From  these  plots,  we  can  clearly  see  the  superior 
performance  of  the  Bayesian  scheme  when  compared 
with  the  approximate  maximum  likelihood  scheme  and 
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Figure  2:  Number  of  floating  point  operations  us¬ 
ing  Gauss-Newton  (-),  the  approximate  maximum 
likelihood  method  (+-)  and  the  Bayesian  filtering 
scheme  (-).  50  simulations  were  run,  with  T  = 

{32, 64, 128, 256, 512}  and  initial  values  having  relative 
efficiency  of  50%  and  i§?%  for  parts  (a)  and  (b)  re¬ 
spectively.  The  signal  to  noise  ratio  was  fixed  at  lOdB. 

the  Gauss-Newton,  especially  when  initialisation  is  poor. 
The  approximate  maximum  likelihood  approach  per¬ 
forms  slightly  better  than  the  Gauss-Newton  scheme 
utilised,  and  is  more  robust  to  poor  initialisation  with 
no  additional  computation. 

This  improved  performance  comes  at  the  cost  of 
slightly  more  computation,  as  seen  in  figure  2.  No  ex¬ 
act  theoretic  analysis  on  computational  complexity  has 
been  provided;  this  plot  merely  provides  flop  counts 
as  calculated  in  Matlab.  From  these,  we  can  see  the 
linear  performance  of  the  approximate  maximum  like¬ 
lihood  and  Bayesian  methods,  and  that  the  computa¬ 
tional  requirements  of  the  Gauss-Newton  method  are 
not  significantly  different.  It  should  be  noted  that  the 
Gauss-Newton  approach  we  have  taken  is  the  fastest 
converging  approach;  the  estimation  performance  may 
be  improved  at  the  expense  of  greater  computation. 
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ABSTRACT 

As  a  complement  to  the  Periodogram,  low  complexity  frequency 
estimators  are  of  interest.  Different  designs  of  these  estimators 
may  affect  the  performance  significantly.  In  this  paper  we  consider 
correlation  based  estimators  and  present  a  design  strategy  that  out¬ 
performs  most  estimators  in  the  same  class.  We  give  a  closed  form 
expression  for  the  asymptotic  performance  together  with  a  new 
method  of  phase  unwrapping  to  resolve  an  introduced  frequency 
ambiguity.  Finally,  we  illustrate  the  performance  through  a  design 
example. 

1.  INTRODUCTION 

Estimation  of  the  parameters  of  a  noise  corrupted  sinusoidal  model 
is  a  frequendy  addressed  problem  in  the  signal  processing  litera¬ 
ture.  The  signal  model  is  of  interest  in  different  application  ar¬ 
eas,  such  as  communications,  radar,  measurements  and  geophysi¬ 
cal  exploration,  among  others.  Starting  with  an  observed  sample 
{2/(0), . . . ,  y{N— 1)},  where  N  is  the  number  of  data  points,  there 
exist  numerous  methods  which  can  be  used  to  estimate  the  sought 
parameters. 

Often,  the  estimation  of  the  frequencies  is  of  particular  inter¬ 
est.  It  is  well  known  that  in  most  applications,  excellent  estimates 
of  the  sought  frequencies  are  easily  obtained  by  peak-picking  the 
Periodogram  of  data,  i.e.,  the  magnitude  square  of  the  discrete 
Fourier  transform.  Besides  the  fact  that  the  Periodogram  is  an 
excellent  frequency  estimator,  it  can  be  efficiently  implemented 
using  the  fast  Fourier  transform  of  the  observations  followed  by  a 
search,  or  interpolation,  for  the  spectral  maxima.  Thus,  there  are 
basically  only  two  scenarios  where  there  is  a  need  for  alternative 
methods,  that  is  i)  when  the  frequencies  are  so  closely  spaced  that 
they  cannot  be  resolved  by  the  Periodogram,  and  ii)  when  the  real¬ 
time  constraints  on  numerical  complexity  requires  low-complexity 
methods. 

We  focus  on  the  second  scenario  only,  i.e.,  low  complexity 
methods.  Therefore,  consider  the  single-tone  model 

y{ri)  =  aeiun  +  e(n),  n  =  0,...,N  -1  (1) 

where  a  =  |a|e'^  is  a  complex-valued  amplitude,  and+j  €  [— 7r,  tt) 
is  the  normalized  (angular)  frequency.  The  noise  e(n)  is  zero  mean 
complex- valued  circular  white  Gaussian  with  variance  cr2.  The 
parameters  (|o|,  j>,  u,  a2)  are  all  unknown,  but  the  frequency  w  is 
the  parameter  of  main  interest. 

For  the  signal  model  in  (1)  it  is  well  known  that  the  maxi¬ 
mum  likelihood  estimate  (MLE)  of  the  frequency  is  given  by  the 
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location  at  which  the  Periodogram  attains  its  maximum  [1],  As  a 
complement  to  the  Periodogram,  there  have  been  a  large  amount 
of  papers  on  low-complexity  estimators.  Basically,  they  can  be 
divided  into  two  classes,  that  is  data  based  and  correlation  based. 
The  well  known  weighted  phase  averager  [2]  belongs  to  the  for¬ 
mer  class.  In  this  paper,  we  consider  correlation  based  estimators, 
i.e.,  an  estimate  of  the  frequency  is  obtained  from  one  or  several 
estimated  entries  of  the  autocorrelation  sequence  {r(m)}  of  y(n) 

r(m)  =  E [y{n)y  (n  -  m)]  =  |a|  +  a26m, 0.  (2) 

Here,  &m.o  is  the  Kronecker  delta  and  (’)  denote  complex  conju¬ 
gate.  From  the  data  we  can  form  the  sample  correlation  sequence 
{f  (m)}  where  f(m)  is,  for  example,  the  unbiased  estimator 

1  N~' 

f{m)  =  m  J2  y(n)y‘(n  -  m )•  (3) 

n=m 

Considering  computational  complexity,  one  may  note  from  the 
above  that  correlation  based  methods  are  only  an  alternative  to  the 
Periodogram  based  methods  when  the  number  K  of  correlation 
lags  is  fixed,  and  K  <C  log(Ar).  Due  to  the  averaging  in  f(m), 
this  class  of  methods  have  SNR  thresholds  between  the  threshold 
of  MLE  and  the  thresholds  of  data  based  methods.  The  correlation 
based  estimators  may,  however,  not  be  statistically  efficient  at  high 
SNR. 

Clearly,  one  can  use  the  truncated  sequence  f (1), . . . ,  r(K) 
and  the  observation  that  the  sequence  itself  is  a  noise  corrupted 
sinusoidal  signal  with  the  same  frequency  as  the  raw  data  and  fit 
the  unwrapped  phase  to  a  straight  line  [3],  By  considering  estima¬ 
tors  formed  from  an  arbitrary  set  of  correlations,  we  show  that  it  is 
possible  to  improve  their  accuracy  (in  terms  of  a  lower  error  vari¬ 
ance,  or  a  lower  SNR  threshold)  while  retaining  a  low  numerical 
complexity.  We  present  a  methodology  to  find  correlation  based 
estimators  with  minimal  error  variance  performance,  subject  to 
an  arbitrary  set  of  correlations  {r(Li), . . . ,  r  (Lr-)}.  By  a  sim¬ 
ple  example,  we  illustrate  that  using  the  given  methodology,  we 
are  able  to  outperform  many  of  the  previously  published  tone  fre¬ 
quency  estimators  in  the  trade-off  between  accuracy/threshold  and 
complexity. 

Estimation  of  phase  parameters  by  linear  regression  requires 
an  unwrapped  phase.  This  is  often  done  on  the  entire  data  set,  for 
which  the  process  is  straightforward.  In  [4]  a  frequency  estimator 
based  on  two  correlations  was  proposed.  It  was  further  shown  how 
the  frequency  ambiguity  can  be  resolved  if  the  correlation  lags  are 
relatively  prime.  We  take  this  approach  a  step  further  and  show 
that  the  phase  unwrapping,  from  an  arbitrary  set  of  phases,  is  an 
integer  assignment  problem  related  to  frequency  estimation.  By 
invoking  the  Chinese  remainder  theorem  (CRT)  we  propose  an  ef¬ 
ficient  implementation  of  the  phase  unwrapping. 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


425 


2.  FREQUENCY  ESTIMATION  FROM  SETS  OF 
CORRELATIONS 

From  (2)  it  is  evident  that  information  about  the  frequency  is  gath¬ 
ered  in  the  phase  angle  of  r(m),  that  is,  for  m^O, 

mui  —  Z[r{m)]  +  2iri  (4) 


2.1.  Frequency  Estimator 

Let  f{L\ f(Lh-)  (such  that  L\  <  ...  <  Lk)  be  K  sample 
correlations.  Any  frequency  estimator  based  on  phase  information 
can  be  formed  as  a  weighted  average  of  the  unwrapped  phase,  i.e., 

w„(L)  =  aT(£  +  2n£)  (7) 


for  some  integer  i  satisfying  0  <  i  <  m.  Here,  Z[-]  denotes  the 
phase  angle  in  [0,  27r).  For  notational  brevity  and  without  loss  of 
generality,  m  is  restricted  to  be  positive,  and  w  is  mapped  to  the 
interval  [0,  27 r)  instead  of  (-77, 7 r).  For  m  =  1  the  frequency  can 
be  unambiguously  estimated,  i.e.  u  =  Z[r (1)],  but  it  is  known  to 
have  poor  performance  [5],  With  prior  knowledge  of  the  frequency 
interval  of  interest,  £,  it  is  shown  that  the  error  variance  can  be 
significantly  reduced  by  increasing  m  [5],  i.e.,  an  estimator 

,  _  Z[r(m)}  +  277 t 

—  - • 

m 

If  t  is  not  known  a  priori  the  frequency  cannot  be  uniquely  deter¬ 
mined  from  one  correlation  only.  In  [4]  a  method,  based  on  two 
correlations  with  relatively  prime  correlation  lags  (m  =  L\ .  m  = 
La),  was  introduced  to  resolve  the  ambiguity.  Here  we  extend  this 
to  the  general  case  of  K  correlations,  and  further  suggest  a  simple 
implementational  design  using  shift  registers. 

Starting  with  the  problem  of  estimating  the  frequency  from 
phase  information  of  the  correlations,  a  system  of  K  equations 
and  K  +  1  unknowns  (w,  4, . . .  ,1k)  follows  from  (4),  i.e., 


where  a  is  a  weighting  vector  with  qtL  =  1  for  unbiased  esti¬ 
mates.  For  clarification  the  dependence  of  L  is  stated.  The  asymp¬ 
totic  error  variance  of  (7)  as  well  as  the  optimal  a  and  correlation 
lag  constellation  L  are  studied  in  detail  in  Sect.  3. 

Note  that  the  WLSE  as  well  as  the  estimators  in  [3-5]  are  spe¬ 
cial  cases  of  (7).  For  [3]  the  correlation  lag  constellation  is  L  = 
[1, . . . ,  A']T  and  the  weighting  vector  is  [a]*,  =  6k2 /[K(K  + 
1)(2A'  +  1)].  In  [4],  K  =  2.  Here,  the  correlation  lags  are 
L  =  [2./V/3.  2N/3+  1]T  and  the  weighting  vector  is  a  =  [1,  0]T. 

2.2.  Phase  Unwrapping 

The  optimization  problem  in  (5)  requires  numerous  computations, 
which  has  to  be  kept  low  by  complexity  reasons.  We  therefore 
introduce  another  approach  to  this  problem,  which  is  less  complex. 
Define  the  dummy  variable  P  as 

P±-  f  X>*(fc,4r(*)] 


L  u>  =  tfi  +  2tt£. 


Here,  L  =  [Li , . . . ,  L*-]T  and  p>,  £  are  defined  accordingly  and 
further,  pk  =  Z[r{Lk)\.  In  an  ideal  case  (no  noise)  only  one  u 
satisfies  all  the  K  equations  if  there  is  no  common  divisor  among 
{Lk}.  With  noisy  measurements  we  can  solve  for  w  in  a  least 
squares  sense  for  every  combination  of  {(. k }  and  pick  the  best  one. 
The  weighted  least  squares  solution  is 


£  —  arg  min(ij5  +  2n£)  n£((p  +  2-k£) 

l£Ct 


m 


n£ 


LtW(£  +  2n£) 

LTWL 
WLLTW 


LTWL 


(5) 

(6) 


W 


where  (T)  denotes  transpose,  Cr  is  the  set  of  feasible  combina¬ 
tions  of  £  and  W  is  a  weighting  matrix.  The  original  frequency 
estimation  problem  is  now  separated  into  two  subproblems.  First, 
phase  unwrapping,  i.e.,  determining  the  unknown  set  {fit},  (5). 
Secondly,  frequency  estimation  from  the  unwrapped  phase,  i.e., 
(6).  Despite  the  joint  nature  of  the  problem,  phase  unwrapping  and 
frequency  estimation  are  often  treated  separately  in  the  literature. 
This  simplifies  analyses  significantly.  A  proper  analysis  requires 
a  careful  treatment  of  errors  in  the  phase  unwrapping.  An  optimal 
weighting  will  then  be  frequency  dependent,  hence  not  very  appli¬ 
cable  in  practice.  In  a  high  SNR  scenario  though,  the  probability 
of  an  incorrect  phase  unwrapping  is  negligible.  Therefore,  the  as¬ 
sumption  of  a  correct  phase  unwrapping,  used  in  the  performance 
analyses,  is  justified. 

In  Sect.  2.2  we  introduce  a  new  alternative  approach  to  the 
phase  unwrapping  in  (5).  But  first,  we  assume  that  £  is  known  or 
estimated,  and  consider  the  frequency  estimation  problem  given  a 
set  of  K  correlations. 


where  L(t)  =  (nf=i  Lq)/Lk  and  {0k}  are  integers  that  satisfy 
=  0-  We  now  show  that  the  set  {4}  can  be  uniquely 
determined  from  P  if  and  only  if  {L*.}  are  all  relatively  prime  and 
{A}  properly  chosen.  It  is  straightforward  to  verify  that 

K 

P  =  ^0kL{k)£k  =  integer 

fc=t 

4  =  {bkP  mod  Lk) 

where  6*.  is  the  modulo  Lk  inverse  of  /3j,L^fc\  i.e.,  the  integer  bk 
satisfies  {bkfikLw  mod  Lk)  =  1.  This  is  an  example  of  the 
CRT  (see  e.g.  [6])  and  a  direct  consequence  of  this  theorem  is 
that  {4}  are  identifiable  if  and  only  if  {/4,  Lk}  are  all  relatively 
prime. 

For  noisy  measurements.  P  will  not  likely  be  an  integer  and 
we  have  to  round  towards  the  closest  one.  This  introduces  an  error 
of  course,  but  the  error  probability  can  be  reduced  by  choosing  8k 
small  in  magnitude.  As  K  and/or  Lk  increases,  the  variance  of  P 
increases  and  can  be  quite  large  due  to  large  values  of  the  prod¬ 
uct  0kL^k\  This  increases  the  probability  of  an  incorrect  phase 
unwrapping,  which  is  the  main  contribution  to  the  threshold  effect 
occurring  in  non-linear  estimation.  In  practice  the  algorithm  is  ap¬ 
plied  on  subsets  with  two  correlations  at  a  time,  rendering  a  lower 
error  probability.  For  a  (sub)set  of  A"  =  2,  choose  0k  —  ±1.  This 
special  case  gives  the  setup  in  [4], 

An  alternative  to  the  modulo  operator  is  tabulation.  We  can 
generate  a  table  of  all  possible  P  and  store  the  values  of  {4}- 
Despite  that  a  table  look  up  can  be  very  efficient,  the  modulo  ap¬ 
proach  has  its  advantages.  It  can  for  example  be  implemented  with 
shift  registers.  Finally,  the  proposed  phase  unwrapping  algorithm 
is  summarized  in  Table  1 . 
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1.  Let  Z[r  (Lfe)]  €  [0,  2i r)  denote  the  phase  of  the  sample  cor¬ 
relation  r(Lk),  where  {Lk}  are  K  relatively  prime  inte¬ 
gers.  Choose  the  set  {/3k}  properly,  i.e.,  integers  satisfying 
Y}k= i  =  0  and  relatively  prime  to  {Lk}. 


2.  With  L(fc)  =  (nf=1  Lq)/Lk<  calculate 


P  =  —round 


2tt 


X>LwZ[r(L*)] 


(8) 


3.  Find  the  integers  {lk}  that  satisfy  P  =  J2k= i  (3kPk)h- 
The  solution  is  unique  and  can  for  example  be  found  by 
tabulation,  or 

4  =  ( bkP  mod  Lk) 

where  bk  is  the  modulo  Lk  inverse  of  (3kLik\  i.e.,  satisfies 
0 bk/3kL (fc)  mod  Lk)  =  1. 


and  may  serve  as  a  lower  bound  on  the  performance  of  this  class 
of  frequency  estimators,  given  the  correlation  lags  L.  This  bound 
is  tighter  than  the  CRB  given  by  [1] 


CRB[w]  = 


6 

SNRiV(iV2  -  1)' 


With  use  of  Lemma  3.1,  an  explicit  expression  of  the  asymptotic 
(as  SNR  — >  oo)  performance  is  given.  This  case  is  studied  in 
detail  in  Section  3.2. 

The  weighting  aopt  in  (9)  is  SNR  dependent,  which  without 
prior  knowledge  of  the  SNR  is  of  little  practical  use.  In  addition,  it 
is  difficult  to  derive  an  explicit  expression  of  the  covariance  matrix 
for  an  arbitrary  SNR.  This  can  be  overcome  by  considering  a  high 
or  low  SNR  case,  for  a  suboptima]  weighting  scheme: 


=  lim  aopt,  a0  =  lim  aopt. 

SNR-+00  SNR-»0 

The  estimator  in  (7)  with  a  =  a and  a  =  ao  results  in  the 
estimator  with  lowest  variance  in  the  limit  of  high  SNR  and  low 
SNR,  respectively. 


Table  1.  A  method  to  resolve  the  ambiguity  in  correlation  based 
tone  frequency  estimation. 

3.  PERFORMANCE  ANALYSIS 

In  this  section  the  performance  of  the  weighted  average  estimator 
in  (7)  is  analyzed.  We  derive  an  expression  for  the  asymptotic 
error  variance  as  well  as  a  lower  bound  tighter  than  the  Cramer- 
Rao  bound  (CRB).  The  performance  is  a  function  of  a  and  L,  and 
we  further  investigate  the  choice  of  them. 

Let  R  be  the  covariance  matrix  of  (p.  Then  the  variance  of  the 
weighted  estimator  (L),  as  given  in  (7),  is 

varfwc]  =  a'Rtt. 


In  Lemma  3.1  the  asymptotic  covariance  matrix  of  (p  (as  SNR  — »• 
oo)  is  given  explicitly. 

Lemma  3.1  (Asymptotic  Covariance  Matrix)  For  fixed  N  and 
Li  >  Lk,  let  {f(Lk)}  be  estimates  according  to  (3).  If  the  phase 
is  correctly  unwrapped  (l  is  known),  then  element  ( k,l )  of  the 
asymptotic  covariance  matrix  R  of  the  phases  <p,  as  SNR  — >  oo, 
is 

1R1  =  min  {Lk,N-Li) 

1  JM  SNR  (N-Lk)(N-L,)- 

Proof:  The  proof  is  given  in  [7],  ■ 


3.1.  Optimal  Weighting 

With  use  of  the  Gauss-Markov  Theorem  the  optimal  (minimal  vari¬ 
ance)  weighting  scheme,  for  a  given  SNR  is 

«°pt(L)  =  LTR-1L  (9) 

with  the  corresponding  estimator  wopt(L)  =  oJlp,(L)(fp  +  2ir£). 
Note  that  this  coincides  with  the  WLSE  when  the  weighting  matrix 
W  =  R  1 .  The  variance  of  this  estimator  is 

var[w0pt(L)]  = 


3.2.  The  High  SNR  Case 


The  (sub)optimal  weighting  schemes  are  subject  to  a  given  corre¬ 
lation  lag  constellation  L.  By  choosing  the  constellation  properly 
we  can  increase  the  performance  further.  In  Lemma  3.2  the  proper 
constellation  for  high  SNR  together  with  the  weights  and  the  re¬ 
sulting  variance  are  stated. 


Lemma  3.2  (Proper  Choice  of  Correlations)  In  the  limit  (as 
SNR  —r  oo)  the  proper  correlation  lag  constellation  is  given  by 

Lk  =  2trnN'  k.i,...,K< f.  ao 

If  Lk  is  a  non-integer  value,  it  is  rounded  to  the  closest  one.  Fur¬ 
ther,  every  lag  Lk  has  a  mirror  point  N  —  Lk  with  the  same  per¬ 
formance  asymptotically  in  SNR.  It  follows  that  the  suboptimal 
weighting  is 


3k(2K  +  l-k) 
aoojfc  -  K/K+i)(2K  +  \) 


resulting  in  an  asymptotic  variance  of&oo  =  aP  (tp  +  2tr£)  as 

,  6(2K  +  l)2 

1  °°J  SNRJV3((2iT  +  l)2  —  1) 

Proof:  The  proof  is  given  in  [7].  Strictly,  an  optimization  with 
respect  to  Lk  is  subject  to  the  condition  that  it  is  an  integer.  If  it 
is  not,  we  choose  the  closest  one.  If  N  is  large  this  quantization 
effect  is  negligible.  ■ 

From  Lemma  3.2  we  see  that  the  resulting  efficiency  tends  to 


efficiency  = 


varfdiso] 

CRB 


(2K  +  l)2  (TV2  —  1) 
{(2K  +  l)2  -  l)iV2 


as  SNR  —►  oo.  Consider  the  special  case  when  2 K  +  1  =  N. 
Then  the  variance  becomes  var[d)oo]  =  CRB[w],  and  the  method 
is  asymptotically  (as  SNR  — >  oo)  efficient  for  any  fixed  N.  We 
make  the  conclusion  that  we  do  not  need  all  N  —  1  correlations  to 
make  a  correlation  based  estimator  asymptotically  efficient.  Only 
half  of  the  set  is  needed.  Note  that  for  a  sequence  of  correlations 
conventional  phase  unwrapping  applies.  For  this  special  case  the 
constellation  equals  that  in  [3],  but  the  weights  differ.  Hence.  Fitz’ 
estimator  is  not  efficient. 
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3.3.  The  Low  SNR  Case  and  Threshold  Effects 

It  is  well  known  that  frequency  estimation  suffers  from  threshold 
effects.  Algorithms  that  rely  on  phase  data  have  a  higher  threshold 
due  to  their  use  of  phase  unwrapping.  An  incorrect  phase  unwrap¬ 
ping  gives  a  dramatic  error  in  the  frequency  estimate,  which  is  the 
main  contribution  to  the  threshold  effect. 

The  error  probability  of  P  in  (8)  increases  with  the  values  in 
L.  Thus,  the  mirror  correlation  lag  constellations  of  (10)  suffer 
from  higher  probability  of  an  incorrect  phase  unwrapping  than  the 
constellation  in  (10).  This  contribute  to  a  high  SNR  threshold, 
whereas  the  variance  are  asymptotically  equal.  Hence,  in  practice 
the  constellation  in  ( 10)  is  chosen. 

As  the  SNR  decreases  a  correct  analysis  must  incorporate  both 
the  probability  of  an  incorrect  phase  unwrapping  as  well  as  a  vari¬ 
ance  expression.  For  SNR  values  below  the  threshold  the  phases 
tend  to  be  uncorrelated  and  uniformly  distributed  over  [0,  2n)  and 
the  frequency  cannot  be  determined.  In  this  case  the  weighting  has 
no  effect  on  the  performance.  Thus,  if  the  frequency  estimator  is 
to  operate  in  a  low  SNR  environment  one  has  to  choose  a  correla¬ 
tion  lag  constellation  L  that  gives  a  low  SNR  threshold.  This  is  in 
general  achieved  for  small  Lk  ■ 

4.  DESIGN  EXAMPLE 

To  illustrate  the  performance  of  the  proposed  design  strategy  we 
consider  the  cases  K  =  3,  i.e.,  all  estimators  use  three  corre¬ 
lations,  except  Tufts  &  Fiore  (T&F)  which  uses  two  correlations 
by  construction  [4].  The  experimental  setup  is  a  single  complex¬ 
valued  sinusoid  with  u  =  0.71  and  N  —  24  sample  points.  In 
Fig.  1  the  root  mean  square  error  (RMS)  is  plotted  versus  the  SNR. 
The  RMS  is  calculated  over  5000  trials.  It  is  easily  verified  that 
the  proposed  design  outperforms  most  of  the  previous  estimators 
(Fitz  [3],  ESPRIT  and  Tufts  &  Fiore  (T&F)  [4])  for  high  SNR 
scenarios.  We  use  the  high  SNR  suboptimal  method  with  the  con¬ 
stellation  in  (10).  From  Fig.  1  it  seems  like  both  the  suboptimal 
estimator  and  the  T&F  method  are  efficient  at  high  SNR,  but  this 
is  not  true.  In  fact,  their  efficiencies  are  49/48  and  9/8  respec¬ 
tively  at  high  SNR. 

The  main  contribution  to  the  complexity  is  the  calculation  of 


the  sample  correlations,  which  is  compared  in  Table  2.  Small 
correlation  lags  are  used  for  the  estimator  with  the  constellation 
in  (10),  which  render  a  low  SNR  threshold.  This  to  a  cost  in 
complexity.  We  have  a  trade-off  between  complexity  and  accu¬ 
racy/threshold  that  has  to  be  treated  from  case  to  case.  In  [7]  a 
design  strategy  given  a  numerical  complexity  is  introduced.  The 
strategy  determines  a  good  constellation  in  a  trade  off  between 
asymptotic  performance  and  low  SNR  threshold.  The  reference 
also  includes  a  more  detailed  complexity  analysis. 


Proposed 

Fitz 

T&F 

KAY 

[2] 

L  in  (10) 

Mirror 

Adds/Mults 

~  6KN 

~  2  KN 

~  8  KN 

~  5.3W 

~  7  N 

Phases 

K 

K 

K 

2 

N  -  1 

Table  2.  Number  of  real  valued  multiplications/additions  as  well 
as  the  number  of  phase  calculations,  for  the  different  estimators  is 
given.  In  addition,  a  comparison  with  Kays  estimator  is  included. 


5.  CONCLUSIONS 

In  this  paper  we  proposed  a  design  strategy  for  correlation  based 
frequency  estimators.  From  an  arbitrary  set  of  sample  correlations 
we  formed  an  estimator  by  weighting  the  unwrapped  phase.  An 
optimal  weighting  scheme  was  derived  and  as  a  complement,  one 
suboptimal  strategy  (high  SNR  case)  was  analyzed.  For  good  per¬ 
formance  we  showed  how  to  choose  the  correlation  lag  constella¬ 
tion  properly.  We  also  proposed  a  new  method  of  phase  unwrap¬ 
ping,  based  on  an  integer  assignment  problem  and  the  CRT.  For 
easy  reference,  see  Table  1 . 

We  compared  the  performance  of  our  design  with  other  esti¬ 
mators  in  the  same  class.  These  estimators  are  special  cases  of 
the  proposed  design,  and  we  can  outperform  them  as  well  as  many 
similar  estimator. 

The  analysis  assumes  a  correct  phase  unwrapping,  i.e.,  P  = 
P.  For  reasonably  high  SNR  the  error  probability  is  negligible,  but 
is  a  main  error  source  for  SNR  <  0  dB  [4,7], 
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ABSTRACT 

Sequential  Bayesian  estimation  for  dynamic  state  space  mod¬ 
els  involves  recursive  estimation  of  hidden  states  based  on 
noisy  observations.  The  update  of  filtering  and  predictive 
densities  for  nonlinear  models  with  non-Gaussian  noise  us¬ 
ing  Monte  Carlo  particle  filtering  methods  is  considered. 
The  Gaussian  particle  filter  (GPF)  is  introduced,  where 
densities  are  approximated  as  a  single  Gaussian,  an  as¬ 
sumption  which  is  also  made  in  the  extended  Kalman  fil¬ 
ter  (EKF).  It  is  analytically  shown  that,  if  the  Gaussian 
approximations  hold  true,  the  GPF  minimizes  the  mean 
square  error  of  the  estimates  asymptotically.  The  simu¬ 
lations  results  indicate  that  the  filter  has  improved  perfor¬ 
mance  compared  to  the  EKF,  especially  for  highly  nonlinear 
models  where  the  EKF  can  diverge. 

1.  INTRODUCTION 

Nonlinear  filtering  problems  arise  in  many  fields  including 
statistical  signal  processing,  economics,  statistics,  biostatis¬ 
tics  and  engineering  such  as  communications,  radar  track¬ 
ing,  sonar  ranging,  target  tracking,  and  satellite  navigation. 
Many  of  these  problems  can  be  written  in  the  form  of  the 
so  called  Dynamic  State  Space  (DSS)  model  [1].  The  sig¬ 
nal  of  interest  {xn ;  n  £  !N},x  6  lit"'1 ,  is  an  unobserved 
(hidden)  Markov  process  of  initial  distribution  p(xo)  rep¬ 
resented  by  the  distribution  p(x„|x„_i).  The  observations 
{y 7i ; «  6  IN},  y  e  111”'“ ,  are  conditionally  independent  given 
the  state  process  {xri;  n  e  IN}  and  represented  by  the  dis¬ 
tribution  p(yn|xn).  Alternatively,  the  model  can  be  written 
as 

xn  =  f(xn_i)  +  Un  (process  equation)  .  , 

yn  =  h(xn)+vn  (observation  equation)  '  ' 

where  un  and  vn  are  additive,  random  noise  vectors  of  given 
distributions. 

In  a  Bayesian  context,  our  aim  is  to  estimate  recursively 
in  time ,  the  marginal  posterior  distribution  referred  to  as 
the  filtering  distribution  p(x„|yo;n)  and  the  predictive  dis¬ 
tribution  p(xn+i|yo:n),  where  y0:n  =  {yo,...,yn}.  Given 
these  densities,  an  estimate  of  the  state  can  be  determined 
for  any  performance  criterion  suggested  for  the  problem. 
The  filtering  density  or  the  marginal  posterior  of  the  state 
at  time  n  can  be  written  as 

P(xn|y0:n)  =  C'np(xn|y0:n-l)p(yn|xn)  (2) 

This  work  was  supported  by  the  National  Science  Foundation 
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where  Cn  =  (f  p(xn|y0:„-i)p(yn|xn)dx„)_1  is  the  normal¬ 
izing  constant.  Furthermore,  the  predictive  density  can  be 
expressed  as 

p(xre+l|y0:n)  =  j  p(xn+l|x„)p(xn|y0:rt)dxn.  (3) 

When  the  model  is  linear  with  additive  Gaussian  noise, 
and  p(xo)  is  Gaussian,  the  filtering  and  predictive  densi¬ 
ties  are  Gaussian  and  the  Kalman  filter  provides  the  mean 
and  covariance  sequentially,  which  is  the  optimal  Bayesian 
solution  [2].  However,  for  most  nonlinear  models  and  non- 
Gaussian  noise  problems,  closed  form  analytic  expression 
for  the  posterior  densities  do  not  exist  in  general.  Numer¬ 
ical  solutions  often  require  high  dimensional  integrations 
which  are  not  practical  to  implement.  As  a  result,  several 
approximations  which  are  more  tractable  have  been  pro¬ 
posed. 

A  class  of  filters  called  Gaussian  filters  provide  Gaussian 
approximations  to  the  filtering  and  predictive  densities.  For 
example,  the  EKF  linearizes  the  nonlinearities  around  the 
current  state  and  provides  Gaussian  approximations  to  the 
densities.  Although  the  EKF  has  been  successfully  imple¬ 
mented  in  some  problems,  in  others  it  diverges  or  provides 
very  poor  approximations.  This  is  especially  emphasized 
when  the  model  is  highly  nonlinear  or  when  the  posterior 
densities  are  multimodal.  In  such  cases  however,  significant 
improvements  are  possible.  Efforts  to  improve  upon  the 
EKF  have  led  to  new  filters  by  Julier  et  al.  [3]  and  Ito  et 
al.  [4],  which  use  deterministic  sets  of  points  in  the  space  of 
the  state  variable  to  obtain  more  accurate  approximations 
to  the  mean  and  covariance  than  the  EKF. 

Recently,  particle  based  sampling  filters  have  been  used 
to  update  the  posterior  distributions  [5], [6], [7],  [8].  A  den¬ 
sity  is  represented  by  a  weighted  set  of  samples  from  the 
density,  which  are  propagated  through  the  dynamic  system 
to  sequentially  update  the  posterior  densities.  These  meth¬ 
ods  are  collectively  called  sequential  importance  sampling 
(SIS)  filters. 

In  this  paper,  we  present  the  GPF  for  nonlinear  DSS 
models  in  Section  2.  Similar  to  the  above  mentioned  Gaus¬ 
sian  filters,  the  GPF  approximates  (2)  and  (3)  as  Gaussians. 
The  justifications  are  that  under  this  assumption  only  the 
mean  and  covariance  need  to  be  tracked  and  given  just  the 
mean  and  covariance,  the  Gaussian  maximizes  entropy  of 
the  random  variable  or  it  is  the  least  informative  distribu¬ 
tion.  The  GPF  updates  the  Gaussian  approximations  using 
a  particle  based  approach,  wherein  random  samples  are  gen¬ 
erated  and  Monte  Carlo  estimates  of  mean  and  covariance 
are  provided.  In  fact,  all  moments  can  be  calculated  sim¬ 
ilarly.  It  is  shown  analytically,  that  as  the  number  of  par¬ 
ticles  used  — >  oo,  the  estimates  converges  almost  surely  to 
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the  minimum  mean  square  estimates  (given  that  the  Gaus¬ 
sian  assumption  holds  true).  It  is  important  to  note  that 
unlike  the  EKF,  the  assumption  of  additive  Gaussian  noise 
can  be  relaxed  for  the  GPF.  The  noises  can  in  gen¬ 
eral  be  non-Gaussian  and  non-additive,  as  long  as  the 
Gaussian  approximation  is  valid.  The  GPF  has  improved 
performance  compared  to  the  EKF  as  demonstrated  by  the 
simulations  in  Section  3.  Finally,  we  conclude  the  paper  in 
section  4. 


information.  As  new  measurements  are  received,  measure¬ 
ment  and  time  updates  are  performed  to  obtain  the  filter¬ 
ing  and  predictive  densities  as  discussed  in  the  following 
sections. 

2.1.  Measurement  Update 

After  receiving  the  n-th  observation  y„,  from  (2)  the  filter¬ 
ing  density  can  be  approximated  as 


2.  GAUSSIAN  PARTICLE  FILTERING 


The  GPF  applies  particle  filtering  methodology  [5], [7], [9]  to 
update  the  mean  and  covariance  based  on  the  Bayesian  up¬ 
date  equations  (2)  and  (3).  The  basic  idea  in  Monte  Carlo 
methods  is  to  represent  a  distribution  /)(x„ )  of  a  random 
variable  x,>  by  a  collection  of  samples  (particles)  from  that 

distribution.  M  particles,  X  =  (xi1*, . . .  ,x„Af)},  from  a  so 
called  importance  sampling  (IS)  distribution  7r(x„)  (which 
satisfies  certain  conditions;  see  [9]  for  details)  are  gener¬ 
ated.  The  particles  are  then  weighted  as  —  • 

^(xn  ) 

If  W  —  {w(1\ . . . ,  u/w)},  then  the  set  {A,  IF'}  represents 
samples  from  the  posterior  distribution  p(x„).  Monte  Carlo 
integration  suggests  that  the  estimate  of 


Ep(g(xn)) 


J g(xn)p(x„)dx„ 


(4) 


can  be  computed  as 


Ep(g(xn)) 


Yjj  Wu)g(xij)) 

5Z, 


(5) 


Using  the  Strong  Law  of  Large  Numbers  it  can  be  shown 
that 

Ep(g(xn))  — >  Ep(g(xn))  (6) 

almost  surely  as  M  -¥  oo;  see  for  example  [9],  The  posterior 
density  can  be  approximated  as 


p(x„)dxn  =  P(dxn ) 


Yx'  w(i)5  U)(dxn) 


(7) 


where  6Xn(d,xn),  is  the  Dirac  delta  function.  For  the  DSS 
models,  SIS  filters  have  been  developed,  which  essentially 
obtain  particles  and  their  weights  from  the  posterior  densi¬ 
ties  in  a  recursive  manner.  However,  a  phenomenon  called 
sample  degeneration  occurs  where  only  a  few  particles  rep¬ 
resenting  the  distribution  have  significant  weights.  A  pro¬ 
cedure  called  resampling  [7]  is  applied  to  mitigate  this  prob¬ 
lem,  but  it  can  give  limited  results  and  can  be  computation¬ 
ally  expensive. 

Since  the  GPF  approximates  posterior  densities  as  Gaus- 
sians,  particle  resampling  is  not  required,  as  long  as  the 
Gaussian  approximations  are  valid.  This  results  in  an  ad¬ 
vantage  of  the  GPF  over  SIS  methods.  Using  the  underlying 
ideas,  the  update  mechanism  for  GPF  is  explained  below. 

The  density  of  Gaussian  random  variable  x  is  writ¬ 
ten  as  A f(x;  p,  S)  where  the  m  dimensional  vector  p  is 
the  mean,  and  the  covariance  is  the  positive  definite  ma¬ 
trix  E.  Assume  that  at  tiinejn  =  1,  we  have  p(xi|yo)  = 
Af(xi;  p0,  So),  where  p0  and  So  are  chosen  based  on  prior 


p(x„|y0:„)  ~  C„p(yn\x„  )A7(x„;  pn ,  E„).  (8) 

The  GPF  measurement  update  approximates  the  above 
density  as  a  Gaussian,  so  that  the  mean  and  covariance  of 
p(xn|yo:n )  are  preserved,  i.e., 

p(x„|yo:„)  =  ,V (x„ ;  /i„ ,  E„).  (9) 

In  general,  analytical  expressions  for  the  mean  pn  and  co- 
variance  E„  of  p(x„  |yo:n )  are  not  available.  However,  for 
the  GPF  update,  Monte  Carlo  estimates  of  pn  and  E„  can 
be  computed  from  (8),  where  samples  x,  '  are  obtained  from 
an  importance  sampling  function  7r(xn|yo:„)-  The  measure¬ 
ment  update  algorithm  is  given  in  Chart  1. 


GPF  -  Measurement  update  algorithm. 

1.  Obtain  samples  from  the  density  7r(x„|yo:„ )  and  de¬ 
note  them  as  {x\P}jii. 

2.  Obtain  the  respective  weights  by 


-U)  _  p(y-lx,V  )Af(x„  =  x,V);/x„,Sn) 

7r(x|,j)|yo:..) 

3.  Normalize  the  weights  as 


(10) 


I’n1  =  fi'n  V  ^  "'I' 


(j) 


j  =  l 

4.  Estimate  the  mean  and  covariance  by 

=e 

=  T,jLl  W»J)(X»  )  -  M,.)(xS/)  -  Pn) 


(11) 


(12) 


GPF  -  Time  update  algorithm. 

1.  Draw  samples  from  Af(xn;  pn,  E„)  and  denote  them 
as  {x!,j)}Ai,. 

2.  For  j  =  1, . . . ,  M,  sample  from  p(x„+i|xn  =  x(,  )  to 
obtain  {x((Jn)+1)}Ai1. 

3.  Compute  the  mean  pn+1  and  covariance  E„+i  by  tak¬ 
ing  sample  means  and  covariances. 


Chart  1. 


Theorem  1  Assume  p(xn  |yo:n-i)  =  A/"(x„;  pn ,  E„ )  at  time 
n.  Upon  receiving  the  n-th  observation  yn,  the  GPF  mea¬ 
surement  updates  the  filtering  density  as  shown  in  Chart 
1.  Then  pn  computed  in  (12)  converges  almost  surely  as 
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M  -4  oo  to  the  minimum  mean  square  error  (MMSE)  esti¬ 
mate  of  x„  .  In  addition,  the  estimate  of  the  MMSE  given 
by  Sn  in  (12)  converges  almost  surely  as  M  — »  oo  to  the 
true  MMSE. 

For  a  proof,  see  [10].  The  same  is  true  for  all  central  and 
non-central  moments. 

The  above  corollary  shows  that  given  that  the  Gaus¬ 
sian  approximation  is  valid,  the  GPF  provides  the  MMSE 
estimate  asymptotically  during  the  measurement  update, 
which  is  clearly  not  true  for  the  EKF.  Hence,  the  GPF  is 
expected  to  perform  better  than  the  EKF. 

2.1.1.  Choice  of  n (•) 

The  choice  of  IS  density  7r(-)  depends  on  the  problem,  [8],  [9], 
For  the  GPF,  a  simple  choice  for  tt(-)  is  p(xn|yo,„_i)  = 
A f(xn;  /in,  E„).  Alternatively,  samples  obtained  in  the  time 
update  step  (presented  in  the  next  section)  in  step  2  can 
be  used.  However,  this  choice  can  be  inadequate  in  some 
applications.  Another  choice  is  A/”(xn;  fin\n,  £„|n),  where 
finin  and  Enin  are  obtained  from  the  measurement  update 
step  of  the  EKF  or  from  the  unscented  Kalman  filter  [3]. 

2.2.  Time  update 

Assume  that  at  time  n,  it  is  possible  to  obtain  samples  from 
p(xn+i|xn).  From  (3)  and  (9) 

P(Xn+l|yO:n)  «  J  J\f(xn]Hn,Hn)p(Xn+l\Xn)dXn.  (13) 

A  Monte  Carlo  approximation  for  (13)  is 

1  M 

P(xn+l|y0:n)  «  JJ  ^p(Xn  +  l|x(n°)  (14) 

i- 1 

where  x?  are  particles  from  A'( xn ;  fin ,  £n ) .  The  GPF  time 
update  approximates  p(x„+i|yo:n)  as  a  Gaussian,  such  that 
its  mean  and  covariance  are  preserved,  i.e., 

P(Xn+l|yO;n)  =  A/'(x„;pn,En).  (15) 

However,  since  closed  form  analytical  expressions  of  fin  and 
En  may  not  be  available,  we  compute  Monte  Carlo  esti¬ 
mates  from  (14).  The  Monte  Carlo  time  update  steps  are 
shown  in  Chart  1. 

Similar  to  Theorem  1,  it  can  be  shown  that  jin+1  con- 
verges  almost  surely  as  M  — t  oo  to  the  MMSE  estimate  of 
xn+i  given  the  observations  until  time  n. 

3.  SIMULATION  RESULTS 

The  GPF  was  applied  to  some  numerical  examples,  and  here 
we  present  results  for  the  univariate  non-stationary  growth 
model  (UNGM),  which  has  been  used  previously  in  [5], [11]. 
We  choose  this  model  because  it  is  highly  nonlinear  and  is 
bimodal  in  nature.  The  DSS  equations  are 

xn  —  axn-i  +  +  7Cos(1.2(n  -  1))  +  un 

O  ,  1+*n-l  (16) 

y<i  =  ®n/20  +  Vn,  n  =  1,  .  .  .  ,  N 

where  vn  ~  A ((vn;  0,  <r()  and  un  ~  Ar(un;  0,  This  model 
is  highly  nonlinear  in  both  the  process  and  observation 


M= 

20 

M= 

100 

s 

O 

O 

O 

EKF 

GPF 

~5I3~ 

GPF 

“5I5~ 

GPF 

^15^ 

1 

175.7 

26.3 

28.6 

12.7 

14.5 

11.2 

11.8 

2 

164.7 

25.7 

29.4 

12.9 

14.0 

11.2 

11.6 

3 

176.1 

25.1 

30.6 

12.2 

mam 

10.9 

4 

160.7 

29.9 

27.4 

13.6 

mam 

5 

199.4 

24.6 

26.5 

11.6 

14.6 

11.2 

6 

182.3 

30.8 

30.3 

15.2 

15.5 

12.8 

7 

185.9 

27.1 

24.9 

13.3 

15.0 

11.3 

8 

175.3 

27.5 

28.8 

11.9 

10.9 

9 

171.1 

25.6 

28.6 

12.1 

■BUM 

10.6 

10 

168.2 

26.7 

27.8 

12.6 

10.6 

Table  1:  MS  Ex  /  for  10  random  simulation  runs  for  the 
EKF,  GPF  and  SIS.  M  is  the  number  of  particles  for  GPF 
and  SIS. 


equations.  Notice  the  term  in  the  process  equation  which 
is  independent  of  xn  but  varies  with  time  n,  this  can  be 
interpreted  as  time  varying  noise.  The  likelihood  p(yn\xn) 
has  bimodal  nature  when  yn  >  0,  but  when  yn  <  0  it  is  uni- 
modal.  The  bimodality  makes  the  problem  more  difficult 
to  address  using  conventional  methods. 

We  compare  performance  of  the  EKF,  GPF  and  SIS 
filters  based  on  the  following  metrics.  MS  Ex/  is  defined 
by  ?E»= i(x"  -  Xn)2  where  xn  =  E(xn|y0;n),  which  is 
obtained  from  the  filtering  density.  When  the  ratio  J~  4~’° 
is  small,  then  the  bimodality  of  the  problem  is  more  severe 
and  we  expect  to  see  improved  performance  of  the  GPF  in 
the  presence  of  this  high  nonlinearity  over  that  of  the  EKF. 

Data  were  generated  using  xo  =  0.1,  al  =  1,  =  1, 

a  =  0.5,  /3  =  25,  7  =  8,  and  N  =  5000  in  each  simula¬ 
tion.  The  initial  distribution  was  p(xo)  ~  A/’(0, 1).  For 
both  GPF  and  SIS,  the  IS  density  chosen  is  the  prior  given 
by  p{xn\yo:n-i )  and  p(xn\xn-i)  respectively.  For  the  GPF 
and  SIS,  since  we  draw  particles  from  p{xn\yo:n)  in  the  mea¬ 
surement  update,  we  obtain  a  Monte  Carlo  estimate  for  yn. 

A  large  number  of  simulations  were  performed  where 
all  the  three  filters  were  used  for  state  estimation.  Results 
are  shown  for  different  choices  of  the  number  of  particles 
M  =  20, 100, 1000.  In  Table  1,  we  show  MSEx/  for  10  ran¬ 
dom  simulations,  with  M  varied  for  GPF  and  SIS  filters. 
The  GPF  has  marginally  better  performance  than  the  SIS 
for  each  choice  of  M.  It  is  noted  that  even  for  M  =  20, 
the  GPF  and  SIS  have  better  performance  than  the  EKF. 
Increasing  M  to  100  gave  significant  improvement  in  per¬ 
formance,  however  increasing  M  to  1000  did  not  change  the 
performance  much.  Note  the  significant  improvement  of  the 
GPF  over  the  EKF  in  terms  of  the  MSEs  for  this  model. 
The  MSEs  for  the  EKF  were  large  due  to  its  tendency  to 
diverge  at  high  nonlinearities. 

In  Figures  1  and  2,  we  show  a  plot  for  the  first  100  states 
and  the  estimates  obtained  using  the  EKF  and  GPF  respec¬ 
tively.  Note  the  tendency  of  the  EKF  to  track  the  opposite 

mode  of  the  bimodality,  especially  when  -  ^-°  is  small.  This 
behavior  was  observed  in  general  for  most  simulation  runs. 
In  Figures  3  and  4,  we  plot  the  error  x„  —  xn  and  the  3o,:rT 
intervals,  where  aerr  was  the  estimated  standard  deviation 
of  the  prediction  error.  Note  that  as  expected,  the  errors 
lie  mostly  within  this  interval  for  the  GPF,  however  not 
so  for  the  EKF.  Also,  the  values  of  crerr  for  the  EKF  are 
much  higher,  pointing  to  the  occurrence  of  divergence  of 
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Figure  1:  Plot  of  the  true  state  and  estimate  of  the  EKF 


Figure  2:  Plot  of  the  true  state  and  estimate  of  the  GPF 


the  filter.  All  of  the  above  observations,  were  made  in  most 
of  the  simulation  runs.  Clearly,  the  GPF  outperforms  the 
EKF  significantly  for  this  highly  nonlinear  example.  The 
GPF  had  marginally  better  performance  than  the  SIS  for 
this  model,  but  the  computational  complexity  of  GPF  is 
much  lower  than  SIS,  since  resampling  is  required  for  the 
SIS.  In  general,  however,  we  cannot  expect  the  GPF  to  work 
better  than  the  SIS  since  the  Gaussian  assumption  is  not 
present  in  the  SIS. 

4.  CONCLUSION 

The  Gaussian  particle  filter  provides  much  better  perfor¬ 
mance  than  the  EKF.  Moreover,  the  additive  Gaussian  noise 
assumption  can  be  relaxed  without  any  modification  to  the 
filter  algorithm.  Updating  the  filtering  and  predictive  den¬ 
sities  as  Gaussians  using  particle  based  approaches  has  the 
advantages  of  easy  implementation  and  better  performance. 
The  parallelizibility  of  the  filter  makes  it  convenient  for 


Figure  3:  Plot  of  the  prediction  error  and  3<r err  interval  for 
the  EKF 


Figure  4:  Plot  of  the  prediction  error  and  3(f<  interval  for 
the  GPF 

VLSI  implementation  and  hence  more  feasible  for  practi¬ 
cal  real  time  applications.  For  extensions  to  this  work,  see 
[10]. 
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Abstract 

In  this  paper,  the  problem  of  time  series  prediction 
is  studied.  A  Bayesian  procedure  based  on  Gaussian 
process  models  is  proposed  and  compared  to  the  radial 
basis  function  networks.  In  our  experiments,  Gaus¬ 
sian  process  models  show  an  excellent  prediction.  The 
conceptual  simplicity,  and  good  performance  of  Gaus¬ 
sian  process  models  should  make  them  very  attractive 
for  a  wide  range  of  problems. 

1  Introduction 

In  the  Bayesian  approach  to  the  regression  problem 
a  prior  distribution  over  the  model  parameters  induces 
a  prior  over  functions.  This  prior  is  combined  with  a 
noise  model  to  yield  a  posterior  distribution  over  func¬ 
tions  which  can  then  be  used  for  predictions.  In  gen¬ 
eral  the  prior  over  functions  has  a  complex  form.  The 
idea  of  Gaussian  Process  (GP)  modeling  is,  without 
parameterizing  the  model  function,  to  place  a  prior 
directly  on  the  functions  space.  The  simplest  type  of 
prior  over  functions  is  called  a  Gaussian  process. 

It  has  been  known  for  many  years  that  such  priors 
over  functions  can  be  defined  using  Gaussian  process 
[7].  Neal  has  shown  that  many  Bayesian  regression 
models  based  on  neural  networks  converge  to  Gaus¬ 
sian  processes  in  the  limit  of  an  infinite  network  [6]. 
This  has  motivated  the  application  of  Gaussian  pro¬ 
cess  models  for  modeling  noisy  data  [4]  [9],  noise  free 
data  [3]  and  also  for  classification  problems  [2]  [4]. 

In  this  paper  we  use  Gaussian  process  for  forecast¬ 
ing  problem,  and  compare  its  performance  with  other 
method,  Radial  Basis  Function  (RBF)  neural  network. 


The  advantage  of  the  Gaussian  process  formulation  is 
that  the  combination  of  the  prior  and  noise  models 
can  be  carried  out  exactly  using  matrix  operations. 
We  also  show  how  the  hyperparameters  of  the  covari¬ 
ance  function  which  control  the  form  of  the  Gaussian 
process  can  be  estimated  from  the  data  using  a  maxi¬ 
mum  likelihood  approach. 

2  Forecasting  Problem 

The  outcomes  of  a  phenomenon  over  time  form  a 
time  series.  Time  series  are  encountered  in  science  as 
well  as  in  real  life.  Most  commonly  time  series  are  the 
result  of  unknown  or  incomplete  understood  systems. 
A  time  serie  x(t)  is  defined  as  a  function  x  of  an  inde¬ 
pendent  variable  t,  generating  from  an  unknown  sys¬ 
tem.  Its  main  characteristic  is  that  its  evolution  can 
not  be  described  exactly.  The  observation  of  past  val¬ 
ues  of  a  phenomenon  in  order  to  anticipate  its  futur  be¬ 
havior  represents  the  essence  of  forecasting.  A  typical 
approach  is  to  try  to  predict  by  constructing  a  predic¬ 
tion  model  which  take  into  account  previous  outcomes 
of  the  phenomenon.  We  can  take  a  set  of  d  such  values 
xt-d+u  —,Xt  to  be  the  model  input  and  use  the  next 
value  xt+ i  as  the  target. 

2.1  Parametric  approaches  to  the  prob¬ 
lem 

In  a  parametric  approach  to  forecasting  we  express 
the  predictor  in  terms  of  nonlinear  function  y(x,0) 
parameterized  by  parameters  6.  It  implements  a  non 
linear  mapping  from  input  vector  x  =  [xt-d+i,  *t]T 
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to  the  real  value  : 

tl  =  y(xx  ,0)  +  e!  i=l.....n  (1) 

where  e  is  a  noise  corrupting  the  data  points. 

Time  series  processing  is  an  important  application 
area  of  neural  networks.  In  fact,  y  can  be  given  by 
a  specified  network.  The  output  of  the  Radial  Ba¬ 
sis  Function  (RBF)  network  is  computed  as  a  linear 
superposition  [1]  : 

K 

ij(x.e)  =  Y^wkgk(x)  (2) 

k= 1 

where  wk(k  =  1, Ii)  denotes  the  weights  of  the  out¬ 
put  layer.  The  Gaussian  basis  functions  gk  are  defined 
as  : 

gk(x)  =  exp(— )  (3) 

where  yk  and  a\  denotes  means  and  variances.  Thus 
we  define  the  parameters  as  8  =  {wk,  yk,  ak]T (L-  = 

I,-,  JO- 

2.2  Nonparametric  approaches 

In  nonparametric  methods,  predictions  are  ob¬ 
tained  without  representing  the  unknown  system  as 
an  explicit  parameterized  function.  A  new  method 
for  regression  was  inspired  by  Neal’s  work  [6]  on 
Bayesian  learning  for  neural  networks.  It  is  an  at¬ 
tractive  method  for  modelling  noisy  data,  based  on 
priors  over  function  using  Gaussian  Processes. 

3  Gaussian  Process  models 

The  Bayesian  analysis  of  interesting  forecasting 
models  is  difficult  because  a  simple  prior  over  param¬ 
eters  implies  a  complex  prior  distribution  over  func¬ 
tions.  Rather  than  expressing  our  prior  knowledge  in 
terms  of  a  prior  for  the  parameters,  we  can  instead 
integrate  over  the  parameters  to  obtain  a  prior  distri¬ 
bution  for  the  model  outputs  in  any  set  of  cases.  The 
prediction  operation  is  most  easily  carried  out  if  all 
the  distributions  are  Gaussian.  Fortunately,  Gaussian 
process  are  flexible  enough  to  represent  a  wide  variety 
of  interesting  model  structure,  many  of  which  would 
have  a  large  number  of  parameters  if  formulated  in 


more  classical  fashion. 

A  Gaussian  process  is  a  collection  of  random  vari¬ 
ables,  any  finite  set  of  which  have  a  joint  Gaussian 
distribution  [4].  For  a  finite  collection  of  inputs, 
x  =  [a;!1), ....  x<’l>]T,  we  consider  a  set  of  random  vari¬ 
ables  y  =  [y11-  ■ ....  to  represent  the  correspond¬ 

ing  function  values.  A  Gaussian  process  is  used  to 
define  the  joint  distribution  between  the  y’s  : 

P(y\x)  ~exp(-iy  1  E_1y)  (4) 

where  the  covariance  matrix  E  is  given  by  the  covari¬ 
ance  function  : 

E  pq  =  cov(yM,yM)=C(xW,xM) 

3.1  Predicting  with  Gaussian  Process 

The  goal  of  Bayesian  forecasting  is  to  compute 
the  distribution  p(y^n+l^\D.x^n+1'>)  of  output  y(n+1) 
given  a  test  input  x(n+1^  and  a  set  of  n  training  points 
D  =  {a:h))  (W|j  =  i,  ...j  n}. 

Using  Bave’s  rule,  we  obtain  the  posterior  distribution 
for  the  (n  +  1)  Gaussian  process  outputs.  By  condi¬ 
tioning  on  the  observed  targets  in  the  training  set, 
the  predictive  distribution  is  Gaussian  with  mean  and 
variance  [8]  : 

p(i,<"+1>|D,*(n+1))  ~  N (yy(n+i) , <7y{n+i))  (5) 

where  : 

py<n  + 1)  =  aTQ~1t 

<?l<  n+1,  =  b-aJQ-1a 

Qpq  =  C(x^\x^)+rHpq 

ap  =  C(x<-n+1\x^),  p=l,...,n 

b  =  C(x(',+1),a:(n+1>) 

r2  is  the  unknown  variance  of  the  Gaussian  noise.  We 
get  a  predictive  distribution,  not  just  a  point  predic¬ 
tion.  This  advantage  can  be  used  to  obtain  the  pre¬ 
diction  intervals  that  describe  a  degree  of  belief  of  the 
predictions. 

3.2  Training  a  Gaussian  Process 

There  are  many  possible  choices  of  prior  covariance 
functions.  From  a  modeling  point  of  view,  we  wish  to 
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specify  prior  covariances  which  contain  our  prior  be¬ 
liefs  about  the  structure  of  the  function  we  are  mod¬ 
eling.  Formally,  we  are  required  to  specify  a  function 
which  will  generate  a  non-negative  definite  covariance 
matrix  for  any  set  of  inputs  points.  We  find  that  the 
following  covariance  function  works  well  [9]  : 

1  d 

C(x^p\x^)  =  vq  exp{— -  y  wi(x\p^  —  x jj9^)2} 

2  (=i 

d 

+a0  +  ai  ^2  x<iP^x<i9^  (6) 

i=i 


where  0  =  (ao,ai,tOi,  ...,Wd,vo,r2)  plays  the  role  of 
hyperparameters. 

Let  us  assume  that  a  form  of  covariance  function  has 
been  chosen,  but  that  it  depends  on  undertermined 
hyperparameters  0.  We  would  like  to  learn  these  hy¬ 
perparameters  from  the  training  data.  In  a  maximum 
likelihood  framework,  we  adjust  the  hyperparameters 
so  as  to  maximize  the  log  likelihood  of  the  hyperpa¬ 
rameters  : 

log  p(D\0)  =  logp(t{1\...,t{n)  |cc(1),...,x(n),0) 

=  “logdetQ-  ^tTQ~1t-  |log27r 

It  is  possible  to  express  analytically  the  partial  deriva¬ 
tives  of  the  log  likelihood,  which  can  form  the  basis  of 
an  efficient  learning  scheme.  These  derivatives  are  : 

|-ioS(,(D|9)  =  -jf(e-'lf)  +  <T't 

We  initialize  the  hyperparameters  to  random  values 
(in  a  reasonable  range)  and  then  use  an  iterative 
method,  for  example  conjuate  gradient,  to  search  for 
optimal  values  of  the  hyperparameters.  We  have  found 
that  this  approach  is  somethimes  susceptible  to  local 
minima,  so  it  is  advisable  to  try  a  number  of  random 
starting  positions  in  the  hyperparameters  space. 


4  Experimental  results 

In  order  to  compare  Gaussian  process  performances 
with  RBF  ones,  we  consider  a  high  chaotic  system 


Figure  1:  Temporal  patterns  for  the  noisy  Mackey 
Glass  equation. 


generated  by  the  Mackey-  Glass  equation  : 


dx(t) 

dt 


0.2x(t  —  A) 

1  +  x(t  —  A)10 


O.lx(t) 


(7) 


with  delay  A  =  30.  The  Mackey-Glass  equation  was 
originally  developed  for  modeling  white  blood  cells 
production  [5],  and  became  quite  common  as  an  ar¬ 
tificial  forecasting  benchmark.  The  difficulty  associ¬ 
ated  with  this  data  set  is  the  high  nonlinearity.  After 
integrating  (7),  we  added  noise  to  time  series.  We 
obtained  500  patterns  for  training  and  300  for  test¬ 
ing  candidates  models,  the  data  set  consisted  of  800 
samples  is  shown  in  Figure  1.  Patterns  were  gener¬ 
ated  windowing  6  inputs  and  1  output.  We  conducted 
experiments  for  different  signal  to  noise  ratios  (SNR) 
using  a  Gaussian  noise.  We  define  the  SNR  as  the 
ratio  between  the  variance  of  the  respective  noise  and 
the  underlying  time  series. 

The  RBF  network  uses  30  centers  chosen  accord¬ 
ing  to  the  validation  set.  The  hayperparameters  were 
adapted  to  the  training  data  using  conjugate  gradient 
search  algorithm  (the  linear  term  in  the  covariance 
function  (6)  involving  a%  was  not  present).  Results  of 
prediction  errors  for  different  SNR,  using  a  GP  models 
and  RBF  networks,  are  given  in  Figure  3.  This  shows 
that  the  Bayesian  learning  using  GPs  performs  better 
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Figure  2:  Residuals  given  by  the  GP  model. 

than  RBF  networks,  specially  in  the  noisy  data  set. 

5  Conclusions 

We  have  presented  the  method  of  forecasting  with 
Gaussian  process  models,  and  shown  how  Bayesian 
learning  performs  better  than  RBF  neural  networks. 
Gaussian  process  models  are  a  simple,  practical  and 
powerful  Bayesian  tool  for  data  analysis. 

Real  life  time  series  are  often  non-stationary,  meaning 
that  the  data  distribution  is  changing  over  time.  It 
is  why,  we  have  also  conducted  some  experiments  on 
the  use  of  the  Gaussian  process  with  a  simple  non- 
stationary  covariance  function  for  real  temporal  pat¬ 
terns.  Although  space  limitations  do  not  allow  these 
to  be  described  here. 

The  examination  of  a  more  complicated  parameterized 
covariance  function  is  currently  under  investigation 
to  improve  the  predictors  tracking  in  non-stationary 
cases. 
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ABSTRACT 

The  bispectrum  is  a  higher-order  statistic  and  is  known 
to  be  a  useful  tool  for  detecting  non-linearity.  A  re¬ 
cent  succinct  example  of  its  power  to  identify  non-linear 
sound  waves  from  broken  bridge  struts  was  given  by  [5]. 
As  well  as  detecting  non-linearity  it  has  the  further  ad¬ 
vantage  that  its  magnitude  and  shape  can  be  used  to 
estimate  the  third  order  non-linear  structure  [1],  When 
a  time  series  is  repeated  (such  as  sound  waves  from  a 
collection  of  bridge  struts)  [2]  showed  how  to  produce 
a  common  spectrum  and  to  estimate  individual  depar¬ 
tures  from  this  global  quantity.  The  purpose  of  this 
paper  is  to  extend  this  method  to  the  bispectrum  and 
give  a  summary  of  common  non-linearity  among  re¬ 
peated  time  series.  We  evaluate  our  method  using  data 
from  a  group  of  people  speaking  the  letter  ‘A’  and  from 
one  person  repeatedly  speaking  this  letter. 


1.  INTRODUCTION 

The  modulus-squared  bispectrum  is  estimated  as 

+  uk)\\ 

j,k  =  1, . . . ,  |  -  1 

where  Wj  =  2ir j/n,  H(co)  =  Xte~lut  is  the  Fourier 
transform  of  a  time  series  AT, . . . , Xn,  and  *  indicates 
the  complex  conjugate. 

For  r  repeated  time  series  each  coordinate  ( j ,  k )  of 
the  estimated  bispectrum  is  modelled  as 

=  | bc (tdj ,  CUfc ) |  Zi(u>j,U)iz')Uijk, 

Ti 

i  =  j,k=  1,  -  -  - ,  —  —  1  (1) 

where  bc(.)  is  the  common  bispectrum,  Zf.)  is  the  per¬ 
turbation  of  the  common  spectrum  for  the  ith  replicate, 


and  Uijk  are  independently  distributed  error  terms. 
Modelling  this  is  a  two-stage  process.  In  the  first  stage 
the  individual  departure  from  the  common  bispectrum 
Zi{.)  is  estimated,  then  the  actual  realisation  of  the  se¬ 
ries  for  a  replicate  is  modelled  through  the  Uijk •  This 
is  a  logical  basis  as  one  might  expect  individual  read¬ 
ings  to  vary  in  a  reasonably  consistent  manner  from 
a  common  quantity,  and  to  give  different  readings  on 
any  one  occasion.  The  modelling  is  done  on  the  log 
scale  as  this  transforms  (1)  to  an  additive  function  and 
improves  the  behaviour  of  the  estimated  bispectrum. 

The  key  is  then  to  summarise  the  degree  of  hetero¬ 
geneity  between  repeated  responses  and  to  look  for  an 
overall  non-linear  structure  using  the  common  bispec¬ 
trum. 

2.  COMMON  BISPECTRUM 

[2]  used  a  parametric  method  to  find  the  common  spec¬ 
trum  but  this  rather  restricts  its  shape  as  well  as  in¬ 
troducing  the  chance  of  making  a  wrong  decision.  Or¬ 
dinary  non-parametric  kernel  smoothing  can  be  used 
to  estimate  the  common  bispectrum,  which  allows  the 
data  govern  to  its  shape.  To  estimate  a  common  bis¬ 
pectrum  we  need  a  two-dimensional  smoothing  pro¬ 
cess  and  an  ideal  method  was  proposed  by  [4].  As 
noted  by  the  author  the  method  does  not  work  well 
at  the  borders,  and  this  is  overcome  by  reflecting  the 
bispectrum  data  in  the  boundaries  (so  we  now  have 
an  area  of  size  h  =  §  -  1  4-  2q).  Working  on  the 
log  scale  allows  the  use  of  a  state  space  model  with 
Yijk  =  log bi(uj,u>k),  Sijk  =  log Zi(uj,uk)  and  eijk  = 
logUijk-  The  Kalman  filter  requires  a  vector  so  we  set 
Yjk  =  (YnkA'hk,  ■  ■  ■ ,  Y,nk)\  and  the  filter  equations 
become, 

Yik  —  Foe k  +  Sjk  +  Cik,  Cik  ~  A (0rxn,  O’  ) 
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Q!k  —  Gctk-1  +  uk;  uk  ~  A  (0 2,'u  ^  H) 


F  =  ,  0] ,  H  = 


‘  3  -1 

0 

-1  4  -1 

G  = 

-1  4 

-1 

-h 

0  -1 

3 

L  h 

0 

where  0  is  the  n  x  h  zero  matrix,  Sjk  are  the  indi¬ 
vidual  specific  effects,  o  and  A  control  the  degree  of 
noise  in  the  smoothing  and  observation  equation;  G 
is  a  matrix  that  smooths  the  data  according  to  the 
discrete  thin  plate  method;  cik  =  (yJkiyik-i)’  "’here 
yik  =  {yin;  ym,  ■  ■  ■ ,  l link)'  is  the  smoothed  surface. 

We  propose  estimating  the  parameters  A  and  a  and 
the  shapes  of  Sik  using  a  Bayesian  MCMC  method 
with  the  following  steps.  Initial  values  for  A(0)  and 
ff(o)  are  taken  from  vague  Gamma(0.5,0.5)  priors.  The 
initial  subject  and  error  effects  are  assumed  to  be  zero, 

Sjk(O)  =  0rxn>  ^ik(O)  hr X h ■ 

Step  1  -  Forward  sweep  of  Kalman  filter 
To  get  a  smooth  estimate  of  the  common  bispec¬ 
trum  we  first  remove  the  subject  and  error  effects  from 
the  observed  data  to  give  Y;*k  =  Yjk  -  Sjk  -  ejk- 

In  a  forward  sweep  of  the  Kalman  filter  we  calculate 
the  mean  and  variance  of  the  innovation  equation 

2 

»k+i  =  Gpk,  Rk+i  =  GC^G'  +  —  H, 

k  =  0, . . . ,  n  -  1 

with  p0  N(Y* ,Var(Y*))  and  C0  =  hn ,  so  that  the 
initial  estimates  of  each  column  are  not  null  and  neither 
are  the  variances. 

The  one-step  forecast  mean  and  variance  are  then 

-Fak+i,  Qi.k+i  =  TRk+iT'  +  cr2/,-,, 
i  =  1, . . .  ,r,  k  =  0, . . .  ,h  —  1 

We  can  then  predict  the  error  ej^+i  =  Y;*k+1  --Fak+i 
for  each  subject.  The  filtering  formula  which  runs  for 
k  =  1, . . .  ,n  —  1  is  then 


Step  2  -  Smoothing 

The  smoothing  backward  step  is  then  run  across 
n  —  1 , . . . ,  1 . 

hk  =  Pk  +  Ak  (hk+i  -  ak+i) 

Hk  =  Ck  +  Ak  (Hk+i-Rk+ijA^ 

Qk  ~  A’  (hk.  d/V/c/Hk) 

where  Ak  =  CkG'Rk+i-1-  The  initial  values  for  the 
vector  and  matrix  hr~,  and  H„  are  p„  and  C„  respec¬ 
tively. 

We  estimate  fjk  =  Yjk  —  Sjk  —  Fhk. 

Step  3  -  Update  a 

We  update  a  and  A  using  the  Metropolis-Hustings 
algorithm  [3].  At  the  m-th  MCMC  progression  gener¬ 
ate  cr,  =  |<7(m_i)  +  $|,  where  $  ~  U[- 1, 1].  The  joint 
likelihood  for  the  two  variance  parameters  is 


p  (A,  a,|Yjk) 

r  n  u 

nnHYik|a,k,Sjk,^)  J}p(«*k|A,rr2)p(A,c72) 


(X 


?=1  A*— 1 

A"2/2 


w^expr2^ 


A-=l 
r  n 


^2  cik'ctk  +  A  ^2  dk'dk 


1  =  1  *=1 


fc=l 


chore 


cik  =  Yjk  ~  Sjk  -  Fa, k,  dk  =  a,k  -  GA*(k-i) 

We  can  safely  assume  our  prior  probabilities  are  inde¬ 
pendent  so  that  p  (A,  cr2)  =  p  (A)  p  (cr2)  and  propor¬ 
tional  to  1.  To  generate  the  o,k  we  need  to  repeat 
steps  1  and  2  with  the  updated  a, .  We  then  accept  cr, 
with  probability  min(l,  r„)  where 

f  L(  A(,„_i), cr,)  I 

Ta  =  exp  i  — - - - t  \ 

and  L(A(n,_i),cr,)  =  logp  (A(w_1), cr. |Yik) -  Otherwise 
cr(m)  =  cr(,n_1).  If  cr,  is  accepted  then  so  are  the  a*k 
and  e,jk- 

Step  4  -  Update  A 

Using  the  same  logic  as  the  previous  step  we  now 
accept  A,  with  probability  min(l,r,\)  where 

f  kf  A, .  <7{  nj } )  I 
rx  =  exp  <  — - r  > 

L  77i  —  l )  i  ^(m)  /  ) 


1  r 

Pk+1  =  ak+i  +  -  Rk+iT'Qi,k+i  ’ej.k+i 

/  i—  1 
1  r 

Ck+l  =  Rk+l - Rk+l^'Qi.k+l  *  fRk+1 


i=l 


So  the  effect  of  the  error  eik  is  averaged  over  the  sub¬ 
jects. 


Step  5  -  Update  subject  effects 

We  again  use  rejection  sampling  to  estimate  the 
subject  effects. 

A  function  that  respects  the  symmetries  of  the  bis¬ 
pectrum  as  well  as  providing  a  range  three-dimensional 
shapes  is 

Sjjk  —  (/>o(Rj,Oi  Ulj,  CU^-)  -f-  01  (Rj.,1 ,  Ulj ,  CUfc)  (2) 


438 


where 


.00006 


.00005- 


A)  (  «,(> ,  ^ j  ■  )  —  Bi,0 

<l>i(Bi,i,Uj,uk)  =  sm(Bi,iUj)sm(BitiGJk) 

We  can  see  that  the  first  term  controls  the  position  on 
the  z-axis  and  the  other  term  control  the  shape.  Note 
that  setting  BiiS  =  0,  s  —  0, 1,  gives  5yfc(0)  =  0. 

Starting  with  the  first  subject  we  generate  B*i,o  = 
BLo(m— 1|  +  4>.  And  then  calculate  the  new  surface 
using  (2)  to  give  an  updated  set  of  surfaces 


«>k  0  0  0).  6>k  0  0  0). 

Figure  1:  Common  bispectrum  for  repeated  and  group 
data 


5*jk 


^2jk(m):  •  •  •  :  ^rjk(m)] 


Terms  yet  to  be  updated  revert  to  their  previous  val¬ 
ues  so  Sijk(m)  =  Sijk (m-1),  i  =  2, ...r.  The  required 
likelihood  is 


p(S*ik|Yik)  < X 


P  (Yik|S*ik)  p  (S,ik) 


1 


wrexP 


1 

2ct2 


EE  C*ik  C*jk 


i= 1  k= 1 


where 

C*ik  —  Y;k  S*jk  F  Qk 

and  p(S*jk)  =  1.  We  accept  S*ik  with  probability 
min(l,  rs)  where 


rs  =  exp 


[  Ts(S*ik) 
Us(Sik(m)) 


} 


in  figure  1.  For  the  repeated  speaker  A  =  0.87,  a  = 
2.2,  Bi,i  =  [2.1, -0.97, —0.86, -0.24],  for  the  group 
data  A  =  0.83,  a  =  3.2,  Bia  =  [6.2,  -4.0, 1.9, 8.9]. 
For  the  single  speaker  the  subject  effects  are  generally 
smaller  than  the  grouped  data  indicating  that  the  bis¬ 
pectrum  (and  hence  the  third  order  non-linearity)  is 
similar  for  all  samples.  For  the  grouped  data  the  sub¬ 
ject  effects  are  much  larger  indicating  that  they  do  not 
conform  to  an  overall  common  bispectrum.  The  shape 
of  the  population  bispectrum  for  the  single  speaker  is 
consistent  with  a  Bilinear  model  with  the  non-linear 
term  at  lags  2  and  4  ( Xt  =  S\XtXt-2  +  fcXtXt-i). 
For  the  grouped  data  the  important  lag  appears  to  be 
at  1. 

Future  work  will  look  for  a  common  bispectrum  in 
the  Mel  frequency  Cepstral  Coefficients. 


and  Ls{ S  *ik)  —  logp(S*ik|Yik).  If  the  new  value  is 
accepted  then  —  [S* i jr- ,  S-) j k(m)  >  •  •  •  -  Er j/q m) ] ; 

Otherwise  Sijk(m)  —  [^ljk(m)^  •  •  •  >*^Yjfc(m)]*  The 

step  is  then  repeated  for  B i,i  and  then  the  procedure 
repeated  for  the  next  subject.  Again  if  a  new  s  is 
accepted  then  so  are  the  associated  a*k  and  e*ik. 

We  repeat  steps  1  to  5  M  times.  We  then  assess 
whether  the  estimates  have  converged,  disregard  the 
initial  burn-in  of  the  chain  and  give  estimates  for  A 
and  er  and  plot  their  marginal  densities.  The  plot  of 
the  common  bispectrum  is  used  to  identify  the  type  of 
non-linearity  whilst  the  degree  of  subject  heterogeneity 
is  a  measure  of  the  deviation  from  this  overall  norm. 

3.  RESULTS 

A  group  of  four  people  were  recorded  speaking  the  let¬ 
ter  ‘A’,  and  one  person  repeated  the  letter  four  times. 
These  signals  were  then  resampled  at  1/20  of  the  orig¬ 
inal  sample  rate  to  give  a  shorter  series.  To  make  all 
signals  the  same  length  they  were  tapered  with  zeros. 
We  used  n  =  250,  r  =  4, 7  =  10  and  M  =  200.  The 
common  bispectrum  for  the  two  data  sets  are  shown 
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ABSTRACT 

A  low  complexity  algorithm  for  parameter  estimation  in 
a  block  fading  multipath  DS-CDMA  system  is  presented. 
The  main  contribution  in  this  paper  is  a  novel  technique 
for  minimizing  a  subspace  fitting  criterion  which  is  ob¬ 
tained  as  a  large  sample  approximation  of  the  Maximum 
Likelihood  (ML)  estimator.  The  minimization  proce¬ 
dure  is  based  on  an  approximation  of  the  exact  criterion 
function,  allowing  a  direct  analytic  solution. 

For  the  acquisition  phase,  an  initialization  procedure 
similar  to  alternating  projections  is  employed  which  ex¬ 
hibits  remarkable  global  convergence  properties.  The  ef¬ 
ficacy  of  the  proposed  method  is  demonstrated  by  means 
of  numerical  simulations. 

1.  INTRODUCTION 

One  of  the  main  concerns  in  CDMA-systems  is  the  near- 
far  problem,  i.e.  that  the  signal  received  from  different 
users  have  very  dissimilar  power  levels.  If  the  signals  are 
nonorthogonal,  as  is  often  the  case  in  real  systems,  con¬ 
ventional  detectors  such  as  the  matched  filter  are  known 
to  deteriorate  rapidly  as  the  ratio  between  the  power 
levels  increase.  To  overcome  this  problem,  a  number 
of  near-far  resistant  multiuser  detectors  have  been  pro¬ 
posed,  see  e.g.  [5].  These  detectors  often  assume  vari¬ 
ous  degrees  of  knowledge  regarding  channel  parameters, 
such  as  time-delays,  complex  amplitudes  and  noise  vari¬ 
ances.  It  has  also  been  observed  that  the  performance  of 
multiuser  detectors  are  highly  sensitive  to  the  quality  of 
channel  parameter  estimates  [2],  in  particular  to  errors 
in  the  time-delays.  This  has  led  to  the  development  of  a 
number  of  near-far  resistant  time-delay  estimators  in  the 
DS-CDMA  context  (see  e.g.  [7]  and  references  therein). 

In  this  paper,  we  are  considering  a  single  user  ap¬ 
proach,  where  the  interfering  users  and  the  background 
noise  are  treated  as  temporally  white  Gaussian  noise 
with  an  unknown  spatial  covariance.  By  deriving  a  large 
sample  approximation  of  the  Maximum  Likelihood  (ML) 
estimator  as  in  [4,  7],  the  resulting  criterion  function  has 
the  structure  of  a  subspace  fitting  problem.  The  novel 
idea  presented  here  is  how  to  search  for  the  minimum 
of  this  function.  In  short,  it  is  based  on  a  linearization 


of  the  criterion  function  [3]  around  a  prespecified  num¬ 
ber  of  points  in  the  parameter-space  (typically  equal  to 
the  number  of  chips/symbol).  In  the  case  of  a  multidi¬ 
mensional  parameter-space  (multipath),  the  search  can 
be  decoupled  into  several  one-dimensional  minimization 
problems,  in  a  similar  fashion  as  in  the  Alternating  Pro¬ 
jection  (AP)  approach  described  in  [8].  This  point  ob¬ 
viously  has  important  implications  with  regards  to  the 
computational  complexity.  Since  the  proposed  method 
is  based  on  a  linearization  of  a  quadratic  error-criterion, 
it  is  naturally  interpreted  as  a  Gauss-Newton  step,  being 
performed  in  a  number  of  grid  points. 

The  following  section  contains  a  brief  description  of 
the  signal  model  being  used,  as  well  as  relevant  assump¬ 
tions.  Section  3  outlines  the  derivation  of  the  criterion 
function  to  be  minimized,  and  describes  how  the  lin¬ 
earization  leads  to  a  closed  form  expression  for  the  pa¬ 
rameter  estimates.  Finally,  the  results  of  numerical  sim¬ 
ulations  are  presented  in  Section  4,  together  with  con¬ 
cluding  remarks. 


2.  SYSTEM  MODEL 


Consider  a  A'-user  asynchronous  DS-CDMA  system  op¬ 
erating  in  a  slowly  fading  multipath  environment  i.e. 
the  fading  is  constant  during  the  observation  interval. 
All  transmitted  symbols  are  members  of  some  complex 
symbol  alphabet  D  and  have  duration  T .  The  code 
waveforms  are  assumed  to  be  of  unit  energy  and  have 
zero  support  outside  [0,  T).  Each  chip  has  duration 
Tc  =  T/L,  where  L  is  the  processing  gain.  After  down 
conversion,  IQ-demodulation  and  an  integrate  and  dump 
stage  with  integration  time  Tc,  the  discrete  baseband 
formulation  of  L  consecutive  samples  (viz.  one  symbol 
interval)  of  the  received  signal  form  user  k  can  at  symbol 
interval  n  be  written  as 


Tk(n)  =  HkBkzk(n)  (1) 
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where 


h  hl(rk, 

l)  ht,.( 

0k,  1 

0 

0 

0k, 1 

0k, Rk 

0 

0 

- 1 

Zfc(n) 


Here,  Rk  denotes  the  number  of  multipath  components 
from  user  k,  (3k,  1  is  the  complex  path  gain  (for  the  first 
path)  and  dk  represents  the  transmitted  bits.  Further¬ 
more,  hi,;  and  hfcr  are  functions  of  the  path  delays  r  and 
code  waveforms  c/;  [4,  7]; 

h«(r)  =(1  -  5)T£~p ck  +  ST^-1^  (2) 

hkr(r)  =(1  -  6)Tp+1ck  +  6Tpck  (3) 

Cfc  =  [cfc(l)  Ck(  2)  ck(L)\  (4) 


where  p 


,  <5  = 


p  and  T?  and  TS  are  the 


p-step  left-  and  right  acyclic  shift  operators  defined  as 

[xi , . . . ,  £jv]  >  %n  ,  0, . . . ,  0]  (5) 

/7)j  [x\ , . . . ,  a;jv]  =[0, . . . ,  0,  xi, . . . ,  £jv— j>]-  (6) 

Collecting  the  contributions  from  all  K  users,  the  total 
received  vector  will  be 

K 


r(n)  =^rfe(n)  +  n(n) 


fc=i 


K 


S"  fcBfez  kin)  +  n(n) 
k=2 

=HiBiZi(n)  +  j(n). 


(7) 


The  superposition  of  multiuser  interference  and  back¬ 
ground  noise,  is  modelled  as  a  zero-mean  complex  Gaus¬ 
sian  random  process  with  second-order  moments 

-E{j(rci)j*(rc2)}  =Rjj5{ni  -  n2)  (8) 

£{j(rci)jr(rc2)}  =0,  (9) 

where  TZjj  is  an  unknown  positive  definite  matrix. 

3.  ALGORITHM 
3.1.  Criterion  Function 

Invoking  the  assumptions  stated  above,  the  negative  log- 
likelihood  function  of  the  received  data  {r(n)}n=1  is  pro¬ 
portional  to 

)  =  log|7^| 

+  ^  |^7/  Jf  E  Wn)  -  Dz (»)}  W")  -  Dz(n)}*  | . 

(10) 


where  |  •  |  denotes  the  determinant  of  a  matrix  and  D  = 
HB.  The  user  index  1  has  been  dropped  for  notational 
convenience,  since  we  are  considering  an  arbitrary  user. 
Elimination  of  TZjj  gives  the  following  criterion  function 


dk(n-l) 

1  N 

dk(n) 

l(r,0)  = 

X  J2ir(n)  -  Dz(n)}{r(n)  -  Dz(n)}* 

n= 1 

(11) 

from  which  an  unstructured  estimate  of  D  can  be  ob¬ 
tained  as  [6] 

D  -  Krn~^  (12) 

and  a  consistent  estimate  of  TZjj  as 


7?  • 


izrr  —  iz*,,TZ~}Tzzr. 


(13) 


Here,  TZrr  =  -jj  J2*=i  r(n)r*(n),  and  iZzz  and  iZzr  are 
defined  similarly. 

Minimizing  (11)  can  be  shown  [4]  to  be  asymptoti¬ 
cally  equivalent  to  minimizing 


Z(t,/3)  =  ||D-HB||£ 


(14) 


where  D  =  TZ}2  D  and  H  =  TZ-2  H  are  the  prewhitened 


n 


estimates.  By  exploiting  the  stacked  diagonal  structure 
of  the  matrix  B,  the  cost  function  can  be  reformulated 
as 

Z(r,/3)  =  ||vec  (D)  -  (I  ®  H)T/3||2 

where 


(15) 


(i 

o  -  A 

0 

o  ••• 

0 

l 

0 

0  ••• 

0 

0  ••• 

1 

0  ••• 

0 

0  ••• 

0 

1 

0 

0  ••• 

0  = 


(  01 


\pRk 


(16) 


If  T (r)  =  (I  ®  H)T,  the  criterion  function  becomes 

Ht,0)  =  ||vec(D)- Y(t)/3||2.  (17) 

Furthermore,  if  the  estimate  of  the  complex  path  gains1 

0  =  T 1  (r)  vec(D)  (18) 

is  substituted  back  into  (17),  the  desired  criterion,  as  a 
function  of  the  time-delay  parameters,  will  finally  be 


r  =  argminlin^r)  vec(D) 


(19) 


Here,  n^r)  =  I  —  T  (r)  YT(r)  is  the  projection  matrix 
projecting  onto  the  orthogonal  complement  of  span  (Y (r)). 

1  f  denotes  the  Moore- Penrose  pseudoinverse. 
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3.2.  Minimization  Procedure 

In  what  follows,  we  will  describe  a  novel  approach  for  the 
minimizing  the  criterion  in  (19)  that  leads  to  a  closed 
form  expression  for  the  minimizing  argument,  r. 

Given  an  estimate  r9  in  the  close  vicinity  of  the 
global  minimizer  t*  ,  consider  the  first  order  Taylor-series 
expansion  of  the  projection  matrix  around  r9  (such  that 
t*  =  t9  +  r) 


from  which  the  increments  r  easily  are  computed  as 

f  =  Ajf.  (30) 

Note  that  to  enforce  a  real  solution  the  real  and  imagi¬ 
nary  parts  of  f  and  A  can  be  stacked.  The  new  estimate 
of  t  will  then  be 

t9+ 1  =  r9  +  f  (31) 


n^(r*) 


nr 


(r*)  + 


E- 


d 

idr1 


Ht(t) 


(20) 


In  order  to  evaluate  (20)  we  need  to  find  the  derivatives 
of  T(r).  So,  by  noting  that 


This  procedure  can  then  be  iterated  with  the  last  esti¬ 
mate  as  the  new  starting  value. 

Acquisition:  In  order  to  find  the  global  minimum 
of  the  criterion  function  (19),  we  propose  to  work  on  the 
linearized  version  derived  above  as  follows; 


0  ' 

Mti)  • 

. .  h,(rR) 

*  1 

L  0  n-j  \ 

MtO  . 

■  ■  h  ,.(tr\ 

(21) 


the  derivative  of  T(r)  with  respect  to  r,  will  be 


G, 


8t 


■T(t) 


0,...,^vg;(Ti),...,0 

0,...,7lJ/gr(Ti),...,0 


(22) 


The  derivatives  of  h/  and  h,.  with  respect  to  r,  on  each 
chip  interval  are  easily  found,  using  (2)  and  (3),  as 


J~h;(r)  =g  ,(t)  =  -TLL~pc  +  T^c  (23) 

|-h  ,(r)  =g,(r)  =  -T?'c  +  T»c  (24) 


where  p  —  .  Note  that  the  derivative  is  undefined 

on  the  chip  borders  so  we  will  have  a  piecewise  linear 
derivative.  With  this  notation,  the  projection  matrix  at 
the  global  minimum  can  be  written  as 

n4(r9  +  f)  s  +  xt‘AGtn^  +  n^GATf,  (25) 


where  G  =  G2  and  A  =  diag(f).  The  notational 
dependence  of  t9  has  also  been  dropped  in  11^-,  Y  and 
G.  Hence  an  approximate  criterion  function  in  the  vari¬ 
able  f  is  obtained  as 


1)  Since  the  derivative  of  the  projection  matrix  is  only 
piecewise  continuous  with  discontinuities  on  multi¬ 
ples  of  chip  intervals  i.e.  r  =  pTc ,  we  have  to  evalu¬ 
ate  the  criterion  function  on  each  chip-transition  as 
well  as  solving  (30)  on  a  grid  of  at  least  L  points, 
i.e  one  starting-point  per  chip  interval.  A  closer 
spacing  will  improve  performance  at  the  expense 
of  increased  complexity.  Of  all  the  candidate  solu¬ 
tions  obtained  in  this  way,  we  select  the  time-delay 
corresponding  to  the  smallest  value  of  the  criterion 
function.  If  desirable,  evaluate  (30)  with  the  last 
estimate  as  the  new  starting  point  as  long  as  a 
significant  improvement  occurs. 

2)  For  multipath,  i.e.  R  >  1,  the  previous  estimates 
can  be  used  as  starting  point  (s),  while  carrying  out 
the  same  procedure  as  outlined  above. 

This  is  akin  to  the  initialization  procedure  used  in  the 
Alternating  Projection  (AP)  algorithm  [8],  with  one  very 
important  distinction;  Whereas  AP  turns  a  .D-dimensional 
grid  search  into  D  1-dimensional  grid  searches,  we  in¬ 
stead  propose  to  carry  out  what  amounts  to  a  Gauss- 
Newton  (GN)  step  in  each  grid  point.  The  results  of  the 
numerical  simulations  presented  in  the  next  section  will 
clearly  demonstrate  the  advantage  of  this  approach. 

4.  NUMERICAL  SIMULATIONS 


V(f)  =  ||Il4(T9  +  f)veC(D)|£.  (26) 

This  criterium  can  be  minimized  with  respect  to  f  to 
find  the  increments  to  r9.  Therefore  we  define 

f  =vec  jlL^-  vec(D)j  (27) 

A  =  (g*TL^  vec(D))T<>Yt*  +  (y*  vec(D))T<>(Gn^), 

(28) 

where  0  is  the  Khatri-Rao  product,  i.e.  columnwise  Kro- 
necker  product  [1].  Using  these  definitions  it  is  possible 
to  rewrite  (26)  as 

U(f)  =  ||f-Af||2F  (29) 


In  the  simulations  presented  here  we  consider  a  K  =  5 
user  scenario  where  each  user  has  an  Rf;  =  2  path  chan¬ 
nel  and  is  assigned  an  N  =  15  chips  per  bit  Gold-like 
code  sequence.  Both  paths  for  the  user  of  interest  have 
the  same  strength,  i.e.  |/?i.i|=  |/?i,2 1 •  Further,  all  in¬ 
terfering  signals  have  the  same  amplitude,  defined  by 
the  near-far  ratio  |/?2,i |/|/?i,i U  which  is  chosen  to  be  ei¬ 
ther  0  or  20  dB.  The  proposed  method,  labelled  GN,  is 
compared  to  the  method  proposed  in  [4],  here  labelled 
AP.  In  Figure  1  the  Root  Mean-Square  Error  (RMSE) 
of  the  estimated  time-delays  is  plotted  versus  the  Signal- 
to-Noise  Ratio  SNR=  l/<r2]G,.=i|/?i,r|-  The  number  of 
training  bits  is  set  to  50.  For  clarity,  only  the  RMSE  for 
the  first  multipath  component  is  shown,  but  the  same 
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SNR 


Figure  1:  The  RMSE  as  a  function  of  the  SNR 


trends  can  be  observed  also  for  the  estimates  of  the  sec¬ 
ond  component.  From  the  figure  we  note  that  the  dif¬ 
ference  in  performance  increases  with  the  SNR,  and  one 
might  conclude  that  the  algorithms  will  have  compara¬ 
ble  performance  at  low  SNR.  This  is  not  entirely  correct 
as  all  outliers  have  been  removed  in  these  plots.  The 
outliers  correspond  to  those  events  where  the  estimate 
is  so  poor,  here  defined  as  |fi  —  n|>  0.5TC,  that  the  esti¬ 
mate  is  useless  and  the  acquisition  fails.  The  probability 
of  acquisition  failure  is  given  in  Table  1  and  2.  It  can  be 
seen  that  failure  to  acquire  the  paths  is  only  a  problem  in 
scenarios  with  high  levels  of  interference  combined  with 
low  SNR. 

The  performance  of  the  estimators  also  depends  on 
the  number  of  known  symbols.  In  Figure  2  this  is  shown 
for  the  same  scenario  as  above.  The  SNR  was  set  to 
10  dB.  Again,  these  results  support  the  conclusions  that 


50  100  150  200  250  300 

No.  of  known  symbols 


Figure  2:  The  RMSE  as  a  function  of  the  number  of 
known  symbols 

the  proposed  method  for  minimizing  the  subspace  fit¬ 
ting  criterion  function  in  (19)  clearly  outperforms  the 
alternating  projection  approach,  while  maintaining  the 
moderate  complexity. 


1ST 

0  dB 

20  dB 

path 

*i(l) 

*i(2) 

Tl(l) 

Tl(2) 

SNR 

2 

2% 

1% 

26  % 

9  % 

8 

~0  % 

0.5  % 

2  % 

1  % 

14 

~0  % 

~0% 

~0  % 

-0% 

Table  1:  Probability  of  acquisition  failure,  AP. 


P2,l 

0i,i 

0  dB 

20  dB 

path 

fr(i) 

fr(2) 

•jt(i) 

*1(2) 

SNR 

2 

1% 

i% 

10  % 

4% 

8 

~0% 

~  0% 

1  % 

~  0% 

14 

~  0  % 

~  0% 

~0  % 

~0% 

Table  2:  Probability  of  acquisition  failure,  GN. 
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ABSTRACT 

This  paper  presents  a  method  of  blind  source  separation  that  jointly 
exploits  the  nonstationarity  and  temporal  structure  of  sources  .  The 
method  needs  only  multiple  time-delayed  correlation  matrices  of 
the  observation  data,  each  of  which  is  evaluated  at  different  time- 
windowed  data  frame,  to  estimate  the  demixing  matrix.  We  show 
that  the  method  is  quite  robust  with  respect  to  the  spatially  corre¬ 
lated  but  temporally  white  noise.  We  also  discuss  the  extension  of 
some  existing  second-order  blind  source  separation  methods.  Ex¬ 
tensive  numerical  experiments  confirm  the  validity  of  the  proposed 
method. 

1.  INTRODUCTION 

Blind  source  separation  (BSS)  is  a  fundamental  problem  that  is 
encountered  in  many  practical  applications  such  as  telecommuni¬ 
cations,  image/speech  processing,  and  biomedical  signal  analysis 
where  multiple  sensors  are  involved.  In  its  simplest  form,  the  m- 
dimensional  observation  vector  x(t)  €  Rrn  is  assumed  to  be  gen¬ 
erated  by 

x(t)  =  As(t)  +  v(t),  (1) 

where  A  €  IRr"  x n  is  the  unknown  mixing  matrix.  s(t)  is  the  n- 
dimensional  source  vector  (which  is  also  unknown  and  n  <  rn), 
and  v(t)  is  the  additive  noise  vector  that  is  statistically  indepen¬ 
dent  of  s(t). 

A  variety  of  methods/algorithms  for  BSS  have  been  developed 
for  last  decade  (for  example,  see  [1]  and  references  therein).  Al¬ 
though  many  different  BSS  algorithms  are  available,  their  prin¬ 
ciples  can  be  categorized  by  three  distinctive  methods  which  are 
based  on  (1)  the  non-Gaussianity  of  source  [2],  (2)  the  temporal 
structure  of  source  [3],  and  (3)  the  nonstationarity  of  source  [4], 

In  this  paper  we  present  methods  that  jointly  exploits  the  non¬ 
stationarity  and  temporal  structure  of  sources  to  estimate  the  mix¬ 
ing  matrix  (or  the  demixing  matrix)  in  the  presence  of  spatially 
correlated  but  temporally  white  noise  (not  necessarily  Gaussian). 
Thus  our  methods  works  even  for  the  case  where  multiple  Gaus¬ 
sian  sources  with  no  temporal  correlations  exit  as  long  as  their  vari¬ 
ances  are  slowly  time-varying.  Moreover,  we  show  that  if  we  use 
just  time-delayed  correlations  of  the  observation  data,  we  can  find 
a  robust  estimate  of  the  demixing  matrix.  To  this  end,  we  introduce 


a  method  of  robust  whitening  and  present  the  Second-Order  Non¬ 
stationary  source  Separation  (SEONS)  method.  We  also  present 
the  extension  of  some  existing  second-order  BSS  methods  which 
are  (1)  the  extended  matrix  pencil  method  and  (2)  the  extended 
Pham-Cardoso  method. 

Throughout  this  paper,  the  following  assumptions  are  made: 
(AS1)  The  mixing  matrix  A  is  of  full  column  rank. 

(AS2)  Sources  are  spatially  uncorrelated  but  are  temporally  cor¬ 
related  (colored)  stochastic  signals  with  zero  mean. 

(AS3)  Sources  are  second-order  nonstationary  signals  in  the  sense 
that  their  variances  are  time  varying. 

(AS4)  Additive  noises  {r,  (()}  are  spatially  correlated  but  tempo¬ 
rally  white,  i.e„ 

E{v(t)vT(t  —  r)>  =  <5Tr,  (2) 

where  ST  is  the  Kronecker  symbol  and  T  is  an  arbitrary 
m  x  m  matrix. 

2.  ROBUST  WHITENING 

The  whitening  (or  data  sphering)  is  an  important  pre-processing 
step  in  a  variety  of  BSS  methods.  The  conventional  whitening  ex¬ 
ploits  the  equal-time  correlation  matrix  of  the  data  x(f),  so  that 
the  effect  of  additive  noise  can  not  be  removed.  The  idea  of  ro¬ 
bust  whitening  lies  in  utilizing  the  time -delayed  correlation  matri¬ 
ces  that  are  not  sensitive  to  the  additive  white  noise.  The  robust 
whitening  method  is  explained  for  the  case  of  stationary  signals. 

It  follows  from  the  assumptions  ( AS2)  and  (AS4)  that  the  time- 
delayed  correlation  matrix  of  the  observation  data  x(t)  has  the 
form 

Rt(t)  =  E{x(t)xT(t  -  r)} 

=  ARs(t)At,  (3) 

_  1 

for  r  /  0.  One  can  easily  see  that  the  transformation  R,  2  (r) 
whiten  the  data  x(t)  without  the  effect  of  the  noise  vector  v(t).  It 
reduces  the  noise  effect  and  project  the  data  onto  the  signal  sub¬ 
space,  in  contrast  to  the  conventional  whitening  transformation 

—  L 

R  ,  2  (0).  Some  source  separation  methods  already  employ  this 
robust  whitening  transformation  [5,  6,  7.  8], 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


444 


In  general,  however,  the  matrix  Rx(t)  is  not  always  posi- 

_  i 

tive  definite,  so  the  whitening  transformation  Rx  2  (t)  may  not  be 
valid  for  some  time-lag  r.  The  idea  of  the  robust  whitening  is  to 
consider  a  linear  combination  of  several  time-delayed  correlation 
matrices,  i.e., 


K 

Cx  =  y>iM4rQ,  (4) 

i=l 

where 

Mx(Ti)  =  ^{Rx(Ti)  +  Rx(n)}  .  (5) 

A  proper  choice  of  {a,}  may  result  in  a  positive  definite  matrix 
Cx  .  For  example,  the  FSGC  method  [9]  can  be  used  to  find  a  set 
of  coefficients  {a,}  such  that  the  matrix  Cx  is  positive  definite. 
The  matrix  Cx  has  the  eigen-decomposition 

C  =  [UUU2] 

where  U i  e  Kmxn  and  D\  €  Ill’1  x  n .  Then  the  robust  whitening 

transformation  matrix  is  given  by  Q  =  Di  2  Uf.  The  transfor¬ 
mation  Q  project  the  data  onto  n-dimensional  signal  subspace  as 
well  as  whitening. 

Let  us  denote  the  whitened  n-dimensional  data  by  z(t) 
z(t)  =  Qx(t) 

=  Bs(t)  +  Qv(t),  (7) 

where  B  6  The  whitened  data  z(t)  is  a  unitary  mixture 

of  sources  with  additive  noise,  i.e.,  BBT  =  I. 

3.  SECOND-ORDER  NONSTATIONARY  SOURCE 
SEPARATION 

This  section  describes  our  main  method,  SEONS,  as  well  as  some 
extensions  such  as  the  extended  matrix  pencil  method  and  the  ex¬ 
tended  Pham-Cardoso  method. 

Now  we  consider  the  case  where  sources  are  second-order 
nonstationary  and  have  non-vanishing  temporal  correlations.  It 
follows  from  the  assumptions  (AS  1)-(AS4)  that  we  have 


[UuUif,  (6) 


pencil  method  (which  is  described  below)  employs  a  symmetric 
definite  pencil. 

Algorithm  Outline:  Extended  Matrix  Pencil  Method  (nonsta¬ 
tionary  case) 

1.  We  partition  the  observation  data  into  two  non-overlapping 

blocks,  (A'l ,  A' 2  }. 

2.  Compute  Mx(t2,  T2)  for  some  time-lag  T2  /  0  using  the 
data  points  in  A4  ■ 

3.  Calculate  the  matrix  C\  (ti)  =  Q-iMx{t\,Ti)  bythe 
FSGC  method  using  the  data  points  in  Mi . 

4.  Find  the  generalized  eigenvector  matrix  V  of  the  pencil 
Mx(t2,T2)  —  ACi  (fi)  which  satisfies 

Mx{t2,T2)V  =  C1{t1)VA.  (11) 

5.  The  demixing  matrix  is  given  by  W  =  VT. 

In  order  to  improve  the  statistical  efficiency,  we  can  employ 
a  joint  approximate  diagonalization  method  [10],  as  in  the  JADE 
[11]  and  SOBI  [3].  The  joint  approximate  diagonalization  method 
in  [10]  finds  an  unitary  transformation  that  jointly  diagonalizes 
several  matrices  (which  do  not  have  to  be  symmetric  nor  positive 
definite).  The  method  SEONS  is  based  on  this  joint  approximate 
diagonalization.  In  this  sense  the  SEONS  includes  the  SOBI  as  its 
special  case  (if  sources  are  stationary).  The  algorithm  is  summa¬ 
rized  below. 

Algorithm  Outline:  SEONS 

1.  The  robust  whitening  method  (described  in  Section  2)  is 
applied  to  obtain  the  whitened  vector  z(t)  =  Qx(t).  In 
the  robust  whitening  step,  we  used  the  whole  available  data 
points. 

2.  Divide  the  whitened  data  {z(t)}  into  K  non-overlapping 
blocks  and  calculate  Mz  (tk,Tj)  for  k  =  1, ...  .  K  and 
j  =  1, . . . ,  J.  In  other  words,  at  each  time-windowed 
data  frame,  we  compute  J  different  time-delayed  correla¬ 
tion  matrices  of  z{t). 

3.  Find  a  unitary  joint  diagonalizer  V  of  {Mz(tk,Tj)}  us¬ 
ing  the  joint  approximate  diagonalization  method  in  [10], 
which  satisfies 

VTMz{tk,Tj)V  =  Ak,h  (12) 


Mx{tk,Ti)  =  AMs(tk,Ti)AT ,  (8) 

for  n  /  0  and  the  index  tk  is  for  time  since  we  deal  with  non¬ 
stationary  sources.  In  practice  Mx(tk,  n)  is  computed  using  the 
samples  in  the  loth  time-windowed  data  frame,  i.e., 

Rx(tk,n)  -  ~  V  x(t)xT{t  -  Ti),  (9) 

k  te Mk 

Mx(tk,Ti)  =  ^{Rx(tk,Ti)  + Rl(tk,Ti)}  ,  (10) 

where  Mk  contains  the  data  points  in  the  fcth  time-windowed  frame 
and  Nk  is  the  number  of  data  points  in  Mk  ■ 

The  matrix  pencil  method  [4]  was  applied  to  the  blind  sep¬ 
aration  of  temporally  colored  sources.  In  general,  however,  the 
pencil  that  consists  of  two  time-delayed  correlation  matrices  is  not 
symmetric  definite  pencil,  which  may  cause  some  numerical  prob¬ 
lems  in  calculating  generalized  eigenvectors.  The  extended  matrix 


where  {A*,j}  is  a  set  of  diagonal  matrices. 

4  The  demixing  matrix  is  computed  as  W  =  V  rQ. 

Recently  Pham  [12]  developed  a  joint  approximate  diagonal¬ 
ization  method  where  non-unitary  joint  diagonalizer  of  several  Her- 
mitian  positive  matrices  is  computed  by  a  way  similar  to  the  clas¬ 
sical  Jacobi  method.  Second-order  nonstationarity  was  also  ex¬ 
ploited  in  [13],  but  only  noise-free  data  was  considered.  In  order 
to  extend  the  Pham-Cardoso  algorithm  into  the  case  of  noisy  data, 
we  employ  a  linear  combination  of  multiple  time-delayed  correla¬ 
tion  matrices  which  is  ensured  to  be  positive  definite,  at  each  data 
block.  The  method  is  referred  to  as  the  extended  Pham-Cardoso 
(which  is  summarized  below).  One  advantage  of  the  extended 
Pham-Cardoso  is  that  it  does  not  require  the  whitening  step  be¬ 
cause  the  joint  approximate  diagonalization  method  in  [13]  finds  a 
non-unitary  joint  diagonalizer.  However,  its  drawback  lies  in  the 
fact  that  it  requires  the  set  of  matrices  to  be  Hermitian  and  positive 
definite,  so  we  need  to  find  a  linear  combination  of  time-delayed 
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correlation  matrices  that  is  positive  definite  at  each  data  frame, 
which  increase  the  computational  complexity. 

Algorithm  Outline:  Extended  Pham-Cardoso 

1.  Divide  the  data  {*(f)}  into  K  non-overlapping  blocks  and 
calculate  Mr  (tk,  r} )  for  k  =  1 ,K  and  j  =  1 , . . .  ,J. 

2.  At  each  data  frame,  we  compute 

j 

Ck  =  ya\k)Mr(tk.,T,)  (13) 

?:= l 

by  the  FSGC  method  for  k  =  1, ... ,  K.  Note  that  {Ca  }  is 
symmetric  and  positive  definite. 

3.  Find  a  non-unitary  joint  diagonalizer  V  of  {Ca  }  using  the 
joint  approximate  diagonalization  method  in  [12],  which 
satisfies 

VCkVT  =  At)  (14) 

where  {Aa}  is  a  set  of  diagonal  matrices. 

4.  The  demixing  matrix  is  computed  as  W  —  V. 


4.  NUMERICAL  EXPERIMENTS 


Several  numerical  experimental  results  are  presented  to  evaluate 
the  performance  of  our  method  (SEONS)  and  to  compare  it  with 
some  existing  methods  such  as  JADE  [11],  SOB1  [3],  matrix  pencil 
methods  [4],  and  Pham-Cardoso  [13].  Through  numerical  exper¬ 
iments,  we  confirm  the  useful  behavior  of  the  proposed  method, 
SEONS,  in  two  cases:  (1)  the  case  where  several  nonstationary 
Gaussian  sources  exist  and  each  Gaussian  source  has  no  temporal 
correlation;  (2)  the  case  where  additive  noises  are  spatially  corre¬ 
lated  but  temporally  white  Gaussian  processes. 

In  order  to  measure  the  performance  of  algorithms,  we  use  the 
performance  index  (PI)  defined  by 


PI 


_ 1_ 

n(n  —  1) 


L 


y . -ImJ _ 

“  maxj  \(J'J  I  / 


+ 


V"  IgA  1 1 

“  max,  \gj,\ 


(15) 


where  gij  is  the  (i.  j)-element  of  the  global  system  matrix  G  = 
W A  and  max,  gtj  represents  the  maximum  value  among  the  el¬ 
ements  in  the  ith  row  vector  of  G,  max,  g, ,  does  the  maximum 
value  among  the  elements  in  the  ith  column  vector  of  G.  When  the 
perfect  separation  is  achieved,  the  performance  index  is  zero.  In 
practice,  the  value  of  performance  index  around  10“ 3  gives  quite 
a  good  performance. 


4.1.  Experiment  1 

The  first  experiment  was  designed  to  evaluate  the  effectiveness  of 
the  proposed  method  in  the  presence  of  several  Gaussian  signals. 
In  this  experiment,  we  used  three  speech  signals  that  are  sampled 
at  8  kFIz  and  two  Gaussian  signals  (with  no  temporal  correlations) 
whose  variances  are  slowly  varying.  These  5  sources  were  mixed 
using  a  randomly  generated  5x5  mixing  matrix  to  generate  5- 
dimensional  observation  vector  with  10000  data  points.  No  mea¬ 
surement  noise  was  added. 


In  this  experiment,  we  compared  the  SEONS  with  JADE,  SOBI, 
and  Pham-Cardoso  [13].  It  is  expected  that  the  performance  of 
JADE  and  SOB!  is  degraded  because  of  the  presence  of  two  white 
Gaussian  sources.  The  result  is  shown  in  Fig.  1  in  which  the  Hin¬ 
ton  diagram  of  the  global  system  matrix  G  is  plotted.  In  Hinton 
diagram,  each  square’s  area  represents  the  magnitude  of  the  ele¬ 
ment  of  the  matrix  and  each  square’s  color  represents  the  sign  of 
the  element  (red  for  negative  value  and  green  for  positive  value). 
For  successful  separation,  each  row  and  column  has  only  one  dom¬ 
inant  square  (regardless  of  its  color).  Small  squares  contribute  per¬ 
formance  degradation.  One  can  observe  that  SEONS  and  Pham- 
Cardoso  work  well  even  in  the  presence  of  nonstationary  Gaussian 
sources  (see  (a)  and  (b)  in  Fig.  1),  compared  to  JADE  and  SOBI 
(see  (c)  and  (d)  in  Fig.  1).  For  the  case  of  JADE,  the  first  and  last 
row  of  G  has  a  relatively  big  square  besides  the  dominant  square, 
which  verifies  that  the  two  white  Gaussian  sources  are  difficult  to 
be  separated  out.  The  SOBI  gives  slightly  better  performance  than 
JADE,  but  its  performance  is  not  comparable  to  SEONS  (see  the 
first  and  fourth  row  of  G,  (d)  in  Fig.  1). 

The  following  parameters  were  used  in  this  experiment: 

•  In  SEONS  and  Pham-Cardoso,  we  partitioned  the  whole 
data  (10000  data  points)  into  100  different  frames  of  data 
(each  frame  contains  100  data  points)  to  calculate  100  dif¬ 
ferent  equal-time  correlation  matrices.  These  matrices  were 
used  to  estimate  the  demixing  matrix. 

•  In  SOBI.  we  used  20  different  time-delayed  correlation  ma¬ 
trices  to  estimate  the  demixing  matrix. 

4.2.  Experiment  2 

The  second  experiment  was  designed  to  show  the  robustness  of 
the  SEONS  in  the  presence  of  spatially  correlated  but  temporally 
white  noise.  We  used  3  digitized  voice  signals  and  2  music  signals, 
all  of  which  were  sampled  at  8  kHz.  The  mixing  matrix  A  € 
]R5x5,  all  the  elements  of  which  were  drawn  from  standardized 
Gaussian  distribution  (i.e..  zero  mean  and  unit  variance).  As  in  the 
experiment  1,  the  whole  data  has  10000  samples. 

The  algorithms  that  are  tested  in  this  experiment,  include  the 
extended  matrix  pencil  method  (Extended  MP).  SEONS,  extended 
Pham-Cardoso,  JADE.  SOBI,  and  SOBI  with  robust  whitening 
method  [8]  (see  Fig.  2).  In  SEONS,  we  partitioned  the  data  into 
50  no  overlapping  blocks  (each  frame  has  200  data  points).  The 
robust  whitening  was  performed  using  a  combination  of  5  time- 
delayed  correlation  matrices  (with  time-lags  {1,  2, . . . ,  5}).  In 
each  data  frame,  we  computed  5  time-delayed  correlation  matri¬ 
ces.  The  joint  approximate  diagonalizer  of  250  correlation  ma¬ 
trices  (5  of  each  blocks  =  5  x  50)  was  computed  to  estimate  the 
demixing  matrix. 

At  high  SNR,  most  of  algorithms  worked  very  well,  except  for 
the  extended  MP  method  since  it  uses  only  two  matrices.  At  low 
SNR,  one  can  observe  that  the  SOBI  with  robust  whitening  out¬ 
performs  the  SOBI  without  whitening.  The  SEONS  gives  slightly 
better  performance  than  the  SOBI  with  robust  whitening  in  most 
of  ranges  of  SNR.  In  the  range  between  0  and  6  dB,  the  SEONS  is 
worse  than  the  SOBI  with  robust  whitening.  It  might  result  from 
the  fact  that  the  SEONS  takes  only  200  data  points  to  calculate 
the  time-delayed  correlation  matrices,  so  the  temporal  whiteness 
of  the  noise  vector  is  not  really  satisfied.  One  can  use  less  number 
of  blocks  (so  more  data  points  for  each  block)  to  reduce  this  draw¬ 
back.  The  advantage  of  SEONS  over  SOBI  with  robust  whitening 
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Figure  1:  Hinton  diagrams  of  global  system  matrices:  (a)  SEONS;  (b)  Pham-Cardoso;  (c)  JADE;  (d)  SOBI  with  PI  .001,  .001,  .05,  .01, 
respectively. 


lies  in  the  fact  that  the  first  method  works  even  for  the  case  of  non¬ 
stationary  sources  with  identical  spectra  shape,  whereas  the  latter 
does  not  (see  the  result  of  Experiment  1). 


Figure  2:  The  performance  comparison  for  SEONS,  SOBI,  SOBI 
with  robust  whitening,  extended  MP,  JADE,  and  extended  Pham- 
Cardoso. 


5.  CONCLUSION 

In  this  paper  we  have  presented  a  BSS  method  that  jointly  ex¬ 
ploits  the  nonstationarity  and  temporal  structure  of  sources.  We 
have  shown  that  our  method,  SEONS,  was  robust  with  respect  to 
the  temporally  white  noise  and  worked  well  even  for  the  case  of 
several  nonstationary  Gaussian  sources  (with  no  temporal  correla¬ 
tions). 
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ABSTRACT 

Recovering  independent  source  signals  from  their  convolu- 
tive  mixtures  without  any  a  priori  knowledge  on  their  struc¬ 
ture  represents  a  great  challenge  in  signal  processing.  In 
this  paper,  we  present  an  efficient  solution  that  is  based  on 
the  joint  block-diagonalization  of  positive  spatio-temporal 
covariance  matrices.  In  the  case  of  instantaneous  mixtures, 
robust  solutions  have  been  proposed  previously.  Taking  ad¬ 
vantage  of  possible  non-stationarity  of  the  sources,  this  new 
technique  uses  only  second  order  statistics.  The  new  ap¬ 
proach  has  been  successfully  applied  to  the  separation  of 
speech  signals. 

1.  INTRODUCTION 

If  we  consider  a  set  of  received  signals  that  are  convolu- 
tive  mixtures  of  independent  source  signals,  the  objective 
of  blind  separation  is  to  recover  the  source  signals  from  the 
set  of  received  signals  without  any  knowledge  of  the  linear 
mixtures  or  the  Linear  Time  Invariant  (LTI)  systems.  For 
instantaneous  mixtures,  a  Second  Order  Blind  Identification 
(SOBI)  algorithm  has  been  presented  [1]  and  showed  to  be 
very  robust  for  temporally  correlated  sources.  An  analog 
technique  based  on  Block  Gaussian  likelihood,  presented 
by  Pham  [2]  uses  a  joint  diagonalization  of  positive  correla¬ 
tion  matrices  of  the  received  data.  An  extension  of  the  SOBI 
technique  to  the  convolutive  mixtures  has  been  considered 
in  [3, 4]. 

When  dealing  with  convolutive  mixtures,  classical  blind  sep¬ 
aration  can  be  achieved  in  two  ways.  One  way  is  to  first 
identify  the  channel  system  from  the  output  mixtures  and 
then  to  design  an  equalizer  accordingly  [5].  The  other  way 
consists  of  directly  designing  an  equalizer  from  the  output 
mixtures.  The  latter  bypasses  the  problem  of  blind  system 
identification  and  is  computation  less  expensive.  Herein, 


we  consider  the  separation  of  the  source  signals  up  to  a 
scalar  filter  instead  of  a  full  deconvolution.  For  this  purpose, 
we  propose  to  extend  the  Block  Gaussian  likelihood  tech¬ 
nique  [2]  to  the  convolutive  mixture  case.  It  is  based  on  the 
joint  block-diagonalization  of  positive  spatio-temporal  co- 
variance  matrices  of  the  received  data.  In  this  contribution, 
the  measure  of  block-diagonality  is  directly  related  to  the 
likelihood  objective  function  and  is  optimized  without  any 
orthogonality  constraint  which  bypasses  any  prior  whiten¬ 
ing  of  the  observations.  The  proposed  method  has  been  suc¬ 
cessfully  applied  to  the  separation  of  speech  signals  up  to  a 
scalar  filter.  In  the  next  sections,  we  will  present  the  data 
model  and  describe  the  proposed  algorithm.  And  finally, 
some  simulation  results  are  provided  in  section  5. 

2.  DATA  MODEL 

For  simplicity,  we  shall  restrict  ourselves  to  the  simplest 
discrete  time  multiple  input  multiple  output  (MIMO)  linear 
time  invariant  model  given  by, 

M  L- 1 

ttj(n)  =  hij{l)sj(n  -  /),  for?;  =  (1) 

j= 1  /=o 

where  sj(n),  j  =  1,  •  •  • ,  M  are  the  M  source  signals  (model 
inputs  ),  Xj(n),  *  =  1,  -  -  - ,  TV,  are  the  N  sensor  signals 
(model  outputs),  h,j  is  the  transfer  function  with  an  overall 
duration  L  between  the  j-th  source  and  the  i-th  sensor. 

The  assumptions  made  about  the  data  model  are  as  follows: 
Al)  The  source  signals  Sj(n),  j  =  1,  •  •  • ,  M,  are  mutually 
decorrelated. 

A2)  Each  source  signal  is  non  stationary. 

A3)  The  channel  matrix  H  defined  in  (3)  is  full  column 
rank. 

The  purpose  of  blind  source  separation  is  to  recover  the 
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source  signals  based  only  on  the  sensor  signals.  In  some 
applications  as  in  speech  processing,  the  separation  of  the 
sources  up  to  a  scalar  filter  is  sufficient.  In  this  paper,  we 
consider  the  problem  of  the  source  separation  up  to  a  scalar 
filter  instead  of  the  full  MIMO  deconvolution  procedure. 
We  can  rewrite  equation  (1)  in  the  following  matrix  form, 

x(n)  =  Hs(n)  (2) 


where 

s(n)  =  [si  (n),  •  •  ■ ,  si(n  —  (L  +  L'  —  1)  +  1), 

■  •  •  ,sM(n),  ■  ■  ■  ,sM{n  ~(L  +  L'  -  1)  +  1)]T 
x(n)  =  [xi  (n),  -  •  •  ,xi(n  -  L'  +  1), 

■  •  -,xN(n),  ■  ■  ■  ,xN(n  -  U  +  1)]T 

Subscript T  denotes  the  transpose  of  a  vector,  and: 


H„ 

HtM  ' 

Hjvi 

■  •  Hjvjtf  . 

with 


Under  the  linear  model  (2),  the  above  equation  can  be  put 
in  the  following  form: 

RI(fc)  =  HRs(i)Hff  (5) 


where,  Rs(fc)  are  the  approximate  covariance  matrices  of 
the  source  signals  and  subscript  H  denotes  the  conjugate 
transpose  of  a  matrix.  Taking  advantage  of  the  mutual  decor¬ 
relation  of  the  source  signals,  Rs  (k)  is  approximately  block 
diagonal,  with  M  diagonal  blocks  of  dimension  (L  +  L'~ 
1)  x  (L  +  L'  -  1)  each,  i.e. 


R .(*)  w 


RSl(fc) 

0 


0 

R-S2  (^) 


0 

0 


(6) 


L  0  0  RSM(k)  \ 


RSl  (k),RS2(k),  ■  ■  ■  ,R  SM(k)  are  the  ’local'  covariance  ma¬ 
trices  of  the  M  sources,  k  being  the  data  sub-block  index. 
Equations  (5)  and  (6)  just  mean  that  any  data  covariance 
matrix  is  block-diagonal  in  the  basis  of  the  column  vec¬ 
tors  of  matrix  H,  which  can  be  retrieved  by  computing  the 
joint  block-diagonalization  of  a  set  of  K  covariance  matri¬ 
ces  Bx(k),k  =  !,•••,  K. 


hij  (0) 


0 


hij(L-l)  •••  0 

hij (0)  •••  hjj(L  -  1) 


Note  that  H  is  a  [NL1  x  M(L  +  L'  —  1)]  matrix  and  Hy 
are  [L1  x  (L  +  L'  -  1)]  matrices.  L'  is  chosen  such  that 
NL'  >  M{L  +  V  -  1). 

We  assume  that  H  is  a  square  matrix,  i.e.,  NL'  =  M(L  + 
L'  —  1),  if  not,  it  can  be  made  square  by  projecting  the  sen¬ 
sors  data  x(n)  into  the  sources  subspace. 


4.  A  JOINT  BLOCK-DIAGONALIZATION 
CRITERION 

In  this  section,  we  derive  a  joint  block-diagonalization  cri¬ 
terion  inspired  from  the  joint  diagonalization  criterion  of 
[2].  Using  the  Kullback-Leiber  divergence  between  two 
zero  mean  A' -variate  normal  densities  with  covariance  ma¬ 
trices  Rq  and  Rj,  respectively,  the  deviation  between  Ra 
and  R;,  is  defined  as: 


D(Ra,  Rj)  >  0  (7) 


3.  THE  PROPOSED  ALGORITHM 

In  this  section,  we  extend  the  Block  Gaussian  likelihood 
technique  [2]  to  the  convolutive  mixture  case.  It  is  based  on 
the  joint  block-diagonalization  of  positive  spatio-temporal 
covariance  matrices  of  the  received  data. 

The  interval  [0,  T]  may  be  divided  into  K  consecutive  sub¬ 
intervals  Ti ,  •  •  • ,  Tk  such  that  the  approximate  covariance 
matrice  of  the  received  data  in  the  sub-interval  T*.  is  given 
by: 

R x(k)  =  — ^  x(n)x(n)*  (4) 

where  nrk  is  the  number  of  elements  (samples)  in  the  sub¬ 
interval  Tk  and  subscript  *  denotes  the  conjugate  transpose 
of  a  vector.  Implicitly,  we  assume  approximate  local  sta- 
tionarity  in  each  data  sub-block. 


with  equality  if  and  only  if  Ra  =  R;,  and  thus  is  a  legitimate 
measure  of  deviation  between  positive  definite  matrices. 
Therefore,  a  measure  of  deviation  from  block-diagona  liza- 
tion  could  be  derived  from: 

D(H_1Rx(fc)H-/f,Rs(A:))  (8) 

Following  the  same  steps  as  in  [2],  the  above  measure  of 

deviation  is  equivalent  to, 

K 

^[log det(bdiag(M.k))  -  logdet(Mfc)]  (9) 

k  =  l 

with 

Mk  =  BRx(k)BH  (10) 

over  the  set  of  matrices  B,  where  bdiag( M*.)  denotes  the 
block-diagonal  matrix  with  the  same  diagonal  blocks  of  size 
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(L  +  V  —  1)  x  (L  +  V  -  1)  as  Mfr. 

From  the  generalized  Hadamard  inequality  [6]  and  for  Her- 
mitian  positive  definite  matrices: 

det(Mr-)  <  det(bdiag(Mi;))  (11) 

with  equality  if  and  only  if  M*.  is  block-diagonal. 

It  follows  that  criterion  (9)  is  a  measure  of  the  global  devi¬ 
ation  of  the  matrices  from  block-diagonal  structure.  Hence, 
minimization  of  (9)  leads  to, 

BwDH1  (12) 

where  the  matrix  D  is  an  arbitrary  block-diagonal  matrix 

coming  from  the  inherent  indeterminacy  of  the  joint  block- 
diagonalization  problem. 

Once  the  matrix  B  is  determined,  the  recovered  signal 
are  obtained  up  to  a  filter  by, 

s(n)  =  Bx(n)  (13) 

Accordingly,  the  recovered  signals  will  verify, 

s(n)  =  Ds(n)  (14) 


0.02  0.04  0.06  0.08 


0.06 
Time  (s) 


Fig.  1.  Original  speech  signals 


5.  SIMULATIONS 


We  present  here  a  simulation  to  illustrate  the  effectiveness 
of  our  algorithm  in  separating  speech  signals.  The  parame¬ 
ter  settings  are  : 

1.  M  =  3,  A  =  2,  L  =  3  and  L'  =  4. 

2.  The  two  speech  signals  are  sampled  at  8kHz. 

3.  The  transfer  function  matrix  of  the  simulated  multi 
channel  is  given  by, 


H  (c) 


1  +  0.5c-1  +  0.7c-2 
0.8  + 0.7c-1  +  0.4c-2 
l  +  0.5c-1  +  0.3c-2 


0.1c-1  +  0.85c-2 
1  +  0.9c-1 

0.7  + 0.85c-1  +  0.1c-2 


Figures  1,  2  and  3  show  a  sample  run  of  the  proposed  al¬ 
gorithm.  Note  that  only  two  speech  signals  among  twelve 
recovered  ones  are  displayed.  These  two  signals  lead  to  the 
smallest  correlation  coefficients. 


6.  CONCLUSION 

In  this  contribution,  we  considered  the  problem  of  the  blind 
separation  of  convolutive  mixtures  of  non-stationary  source 
signals.  We  proposed  a  solution  based  on  the  joint  block- 
diagonalization  of  positive  spatio-temporal  covariance  ma¬ 
trices.  This  technique  uses  only  second  order  statistics  and 
unlike  [3, 4]  has  no  orthogonality  constraint  which  bypasses 
any  prior  whitening  of  the  data.  This  method  is  well  suited 
when  applied  to  the  deconvolution  of  speech  signals,  which 
is  of  great  importance  in  practical  applications  [7], 


Time  (s) 

Fig.  2.  Mixed  speech  signals. 


[7]  F.  Ehlers  and  H.  G.  Schuster,  “Blind  separation  of 
convolutive  mixtures  and  an  application  in  automatic 
speech  recognition  in  a  noisy  environment,”  IEEE 
Trans,  on  SP,  vol.  45,  pp.  2608-2612,  Oct.  1997. 
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Fig.  3.  Two  of  the  twelve  recovered  speech  signals. 
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ABSTRACT 

Blind  source  .separation  (BSS)  of  independent  sources  from 
their  convolutive  mixtures  is  a  problem  in  many  real-world 
multi-sensor  applications.  In  this  paper,  we  propose  an  im¬ 
proved  BSS  method  for  audio  signals  based  on  ICA  (Inde¬ 
pendent  Component  Analysis)  technique.  It  is  performed 
by  implementing  noil-causal  filters  instead  of  causal  filters 
within  the  feedback  network  of  the  ICA  based  BSS  method. 
It  reduces  the  required  length  of  the  unmixing  filters  con¬ 
siderably  as  well  as  provides  better  results  and  faster  con¬ 
vergence  compared  to  the  case  with  the  conventional  causal 
filters.  The  proposed  method  has  been  simulated  and  com¬ 
pared  for  real  world  audio  signals. 

1.  INTRODUCTION 

Blind  signal  separation  refers  to  performing  inverse  channel 
(or  unmixing  filter)  estimation  despite  having  no  knowledge 
about  the  true  channel  (or  mixing  filter).  The  word  “blind" 
refers  that  the  independent  original  source  signals  and  the 
mixing  process  are  unknown. 

A  typical  scenario  would  be  to  record  two  people  talk¬ 
ing  at  the  same  time  using  two  microphones.  The  recorded 
signals  would  then  of-course  consist  of  a  mixture  of  the  two 
speech  signals.  The  applied  algorithm  then  tries  to  estimate 
the  inverse  channel  and  force  the  recored  signals  to  be  in¬ 
dependent  of  each  other  (in  order  to  separate  the  signals). 

BSS  method  based  on  ICA  (independent  component 
analysis)  technique  has  been  found  effective  in  signal  sepa¬ 
ration  comparing  other  BSS  methods.  The  serious  limita¬ 
tion  of  this  technique  is  the  requirement  of  long  unmixing 
filters  in  order  to  estimate  inverse  channels[l] 

The  objective  of  this  paper  is  thus  to  improve  an  ICA 
based  BSS  method  by  reducing  the  length  of  the  unmixing 
filters.  This  can  be  achieved  by  implementing  non-causal 
filters  instead  of  conventional  causal  filters  within  the  feed¬ 
back  network  of  the  ICA  based  BSS  method.  This  non- 
causal  filters  within  the  feedback  loop  is  able  to  reduce  the 
length  of  the  unmixing/separation  filters,  while  improve  the 
results  of  the  source  separation  by  reducing  the  whitening 
effect  (i.e.  not  sensitive  to  whitening  in  the  inversion  of  non¬ 
minimum  phase  system).  The  feedback  network  within  the 
non-causal  filters  is  then  able  to  invert  the  mixing  even  if  the 
direct  paths  are  not  “good”,  i.e.  when  the  direct  channels 
filters  are  not  gauranteed  to  have  stable  inverse.  Moreover, 


for  adaptation  of  the  learning  process,  a  variable  step-size 
parameter  is  adopted  providing  the  stable  convergence. 

2.  BACKGROUND  OF  THE  BSS  ALGORITHM 
2.1.  “Infomax”  or  Entropy  Maximization  Criterion 

BSS  is  the  main  application  of  independent  component  anal¬ 
ysis  (ICA),  which  reduces  redundancy  between  source  sig¬ 
nals  and  make  them  “as  independent  as  possible”.  In  BSS, 
second  order  statistics  are  inadequate  to  reduce  redundancy 
between  the  input  signals.  Higher-order  statistics  are  re¬ 
quired  for  redundancy  reduction  and  these  are  determined 
mainly  in  two  ways.  The  first  is  the  explicit  estimation  of 
the  cumulants  and  polyspectra[2],  The  second  is  by  obtain¬ 
ing  higher-order  statistics  through  the  use  of  static  nonlin¬ 
ear  functions[3]. 

Bell  and  Sejnowski[4]  proposed  an  information-theoretic 
approach  for  blind  source  separation  (BSS),  which  is  re¬ 
ferred  to  as  the  “Infomax  algorithm”.  Information  theory 
can  be  used  to  unify  several  lines  of  research[5]  and  different 
theories  recently  proposed  for  independent  component  anal¬ 
ysis  (ICA),  leading  to  the  same  iterative  learning  algorithm 
for  BSS. 


2.2.  Separation  of  Convolutive  Mixture 

The  initial  algorithm  of  Bell  and  Sejnowski[4]  deals  with  the 
instantaneous  mixture  problem.  The  algorithm  was  further 
extended  by  Torkkola  for  the  convolutive  mixture  problem. 
Given  measured  signals,  which  are  combinations  of  indepen¬ 
dent  sources,  the  aim  of  blind  separation  is  to  produce  out¬ 
puts,  which  recreate  the  source  signals,  i.e.,  y\{k)  =  .si(fc), 
y2{k)  —  s2(k),-  ■  -,y„(k)  =  s„(k).  Nothing  can  be  assumed 
about  the  sources  except  that  they  are  statistically  inde¬ 
pendent.  Torkkola[6]  suggested  the  feedback  structure  for 
the  separation  of  convolutive  mixture  (see  also[5]).  The 
nonlinear  function,  /,  must  be  a  monotonically  increasing 
or  decreasing  function.  In  this  paper  the  nonlinear  function 
used  is  defined  as  y  =  f(u)  =  l/(l  +  e_“).  The  learning  rule 
for  the  convolutive  mixture  can  follow  the  same  steps  as  the 
instantaneous  case[4].  Minimizing  the  mutual  information 
between  outputs  y \  and  y2  can  be  achieved  by  maximizing 
the  entropy  at  the  output[5].  Assuming  causal  FIR  filters 
for  w'] .  the  network  performs  the  following  operations  in 
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the  time  domain: 

«i  (0  =  Ef=ou,i1si(<-fc)  +  Efc=iW-2“2(<-fc)  m 

«2(t)  =  E*=o  wfx2(t.  -k)  +  Z^=l  -  *0 

where  is  the  A:th  tap  of  the  filter  from  source  j  to  sensor 
i  and  Lij  is  the  filter  length  for  the  respective  filter. 

The  relationships  between  the  mixing  filter  and  the  sep¬ 
aration  filter  can  be  expressed  in  z-transform[6]: 

Wn{z)  =  An(z)-1,  Wi2(z)  =  -Ai2(z)An(z)~1 

W22{z)  =  Aaa(z)-1,  W2i(z)  =  -A2i(z)A22{z)~1 

(2) 

This  is  a  network  which  combines  the  separation  and 
deconvolution  problem.  Maximizing  the  entropy  at  the  out¬ 
put  will  result  in  Wu  and  W22  not  only  inverting  An  and 
A22,  but  also  whitening  the  sources.  This  can  be  avoided 
by  forcing  Wu  and  W22  to  mere  scaling  coefficients.  In  the 
ideal  case,  Wu  and  W22  will  have  the  following  solutions: 

Wu(z)  =  1,  W12(z)  =  -A12(z)A22(z)~1 

W22{z)  =  1,  Wn(z)  =  -A21(z)Au{z)-1  W 

Further,  when  the  feedback  network  is  used,  we  have  to 
consider  the  relations:  U\(z)  =  An(z)Si(z)  and  U2(z)  = 
A22(z)S2(z)  which  are  related  to  what  each  sensor  would 
observe  in  the  absence  of  interference  from  the  other  source. 

The  learning  rules  for  the  separation  matrix  are: 

Awq  oc  (1  -  2yi)xi  +  1/wo  , 

AiuJ,*  oc  (1  -  2 yi)xi(t-k)  (4) 

Aw/  oc  (1  —  2 yi)uj(t  —  k) 

where  k  =  0, 1, 2,  •  •  • ,  L y. 

3.  THE  IMPROVED  ICA  BASED  BSS 
METHOD 

Torkkola’s  algorithm  [6]  works  only  when  the  stable  inverse 
of  the  direct  channel  filters  (An  and  A22)  exist.  This  is  not 
always  guaranteed  in  real  world  systems.  In  the  separation 
of  audio  signals,  the  direct  channel  is  the  path  from  the 
source  to  the  ipsi  microphone.  The  corresponding  transfer 
function  would  come  from  a  very  complex  process,  for  which 
it  is  not  guaranteed  that  there  will  a  stable  inverse  for  this 
transfer  function. 

However,  even  if  a  filter  does  not  have  a  stable  causal 
inverse,  there  still  exists  a  stable  non-causal  inverse.  There¬ 
fore,  the  algorithm  of  Torkkola  can  be  modified  and  used 
even  though  there  is  no  stable  (causal)  inverse  filter  for  the 
direct  channel. 

The  relationships  between  the  signals  are  now  changed 
to: 

“i(*)  =  E h=-m  wklx i({  ~k)  +  Zt-M  wlk2Mt  -  k ) 

“2(f)  =  Zt-M  WfXzit  ~k)  +  Zt-M  Wklu l(f  -  k ) 

(5) 

where  M(even)  is  half  of  the  (total  filter  length-1)  and  the 
zero  lag  of  the  filter  is  at  ( M  +  1).  In  (5)  there  exist  an 
initialization  problem  regarding  filtering.  To  calculate  the 
value  of  ui(f),  the  values  ofu2(t),u2(t+l),  •  ■  •  ,u2(t+M)  are 
required  which  are  not  initially  available.  Since  learning  is 


an  iteration  process,  we  have  used  some  pre-assigned  values 
to  solve  this  filter  initialization  problem  or  padded  signals 
with  zeros  of  length  M .  For  example,  the  input  value  of 
x2(t)  is  used  for  the  output  u2 (t)  at  the  first  iteration.  The 
new  values  generated  at  the  first  iteration  are  then  used 
for  the  second  iteration.  This  process  is  repeated  until  its 
convergence  to  certain  values. 

The  derivative  of  the  learning  rule  can  follow  the  same 
procedure  as  in  Torkkola[6].  According  to  (5),  only  the  co¬ 
efficients  of  IF]  2  and  W21  have  to  be  learned.  The  learning 
rule  is  the  same  in  notation  but  different  in  nature  because 
the  values  of  k  have  changed: 

Afflj)  oc  (1  —  2 yi)uj(t  —  k)  (6) 

where  k  =  —M,  —M  +  1,  •  •  • ,  M. 

The  step-size  is  considered  to  be  an  exponentially  time- 
varying  step-size  and  the  initial  step-size  is  calculated  as 
l/(2Amax),  where  \max  is  the  maximum  singular  value  of 
the  correlation  matrix  R  for  the  initial  input  (mixed  signal) 
block  of  length  (2 M  +  1). 

4.  RESULTS  AND  PERFORMANCES 

Separations  of  audio  signals  have  been  performed  for  var¬ 
ious  real  mixed  data,  e.g.  for  two  musics,  for  two  speech 
of  the  same  languages  and  different  languages,  for  a  mu¬ 
sic  and  a  speech.  In  the  following  we  present  two  illustra¬ 
tive  results  for  the  real  audio-files,  which  are  available  in 
http://www.  cnl.  salk.  edu/~  tewon/blind.html 1 . 

Example  1:  In  this  example,  two  different  music  signals  are 
separated  when  sampling  frequencies  of  the  signals  are  22 
kHz.  Fig.  1  shows  small  portions  of  the  separation  results. 
The  original  signals  are  shown  in  Figs.  l(a)-(b),  whereas 
mixed  signals  and  separated  signals  are  presented  in  Figs. 
l(c)-(d)  and  Figs.  l(e)-(f).  In  the  above,  the  unmixing  fil¬ 
ters  length  used  is  161,  which  is  the  minimum  filter  length 
needed  for  the  successful  separation.  The  stopping  crite¬ 
rion  for  the  learning  process  is  when  the  change  of  weights 
{Awl2  and  Awf1)  are  less  than  a  threshold  value,  which  is 
set  to  be  0.0001.  Then  the  number  of  iterations  required  is 
about  100.  The  audio  signals  can  be  listened  in  our  newly 
developed  web-page  (http://members.tripod.com/zen76/ 
index.htm).  Satisfactory  separation  results  are  obtained 
from  both  subjective  (listening)  and  objective  (cross-correla¬ 
tion)  performance  testings.  Fig.  2  shows  a  very  low  cross- 
correlation  values  between  the  separated  signals  compared 
to  that  of  the  mixed  signals  indicating  efficiency  of  the 
present  method. 

Example  2:  In  Fig.  3  the  separation  results  are  illustrated 
for  the  two  recorded  speech  data  having  sampling  frequen¬ 
cies  of  16  kHz.  The  experiments  have  been  performed  with 
two  speakers  speaking  simultaneously  in  a  normal  office 
room[7].  Figs.  3(a)-(b)  show  the  small  portion  of  the  mixed 
signals  recorded  from  two  microphones.  The  corresponding 
separated  signals  are  shown  in  Figs.  3(c)-(d).  The  listening 

^he  details  of  the  experimental  setup  for 
the  audio  sound  recording  can  be  found  in 
http://www.cnl.  salk.  edu/'tewon/blind.html 
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test,  shows  speech  separation  is  almost  perfect.  In  this  ex¬ 
ample,  the  filter  length  used  is  321  being  the  minimum  filter 
length,  which  provides  good  results.  The  stopping  learning 
threshold  is  chosen  to  be  same  as  in  Example  1  and  needed 
120  iterations  to  reach  the  threshold.  It  is  found  that  ex¬ 
treme  value  of  the  cross-correlation  for  the  mixed  signals  in 
Example  2  is  much  higher  than  that  of  mixed  signals  in  Ex¬ 
ample  1.  This  could  be  the  reason  for  requiring  larger  filter 
length  and  more  iterations  for  the  former  case  compared  to 
the  latter  case. 

Also  note  that  this  method  can  be  extended  for  more 
than  two-source  two-sensor  case.  An  illustrative  result  for 
the  three-source  three-sensor  case  is  shown  in  http://mem.ber 
s/tripod. com/zen76/index.ht.m.  It  is  found  that  the  three 
sources  from  their  three  mixtures  can  be  successfully  sepa¬ 
rated. 


[4]  A.. I.  Bell  and  T..1.  Sejnowski,  “An  information  max¬ 
imisation  approach  to  blind  separation”,  Neural  Com¬ 
putation,  vol.  7,  1995,  pp.  1129-1159. 

[5]  H.H.  Szu,  I.  Kopriva,  A.  Persin,  “Independent  com¬ 
ponent  analysis  to  resolve  the  multi-source  limitation 
of  the  nutating  rising-sun  reticle  based  optical  track¬ 
ers”,  Optics  Communication,  vol.  17G,  March  2000,  pp. 
77-89. 

[6]  K.  Torkkola,  “Blind  separation  of  convolved  sources 
based  on  information  maximization” ,  IEEE  Workshop 
Neural  Networks  for  Signal  Processing,  Kyoto,  Japan, 
Sept  4-6,1996. 

[7]  T-W  Lee,  A.J.  Bell  and  R.  Orglmeister,  “Blind  source 
separation  of  real  world  signals”,  Proc.  IEEE  Int,. 
Conf.  Ne.ural  Networks,  June  97,  Houston,  pp.  2129 
2135. 


5.  COMPARISON 

The  results  are  compared  with  the  Te-Wons  results  shown 
in[l,  7]  (see  also  in  http://www.cnl.salk.edu/~tewon/blind. 
html).  The  limitation  of  the  Te-Wons  method  is  that,  it  re¬ 
quires  large  filter  length  (e.g.  1024  samples),  which  is  signif¬ 
icantly  reduced  by  the  proposed  method  (e.g.,  for  the  case 
in  our  simulation  examples  the  reduction  is  5  to  6  times). 
Moreover,  according  to  Fig.  4,  the  cross-correlation  values 
are  found  less  for  the  presented  method  compared  to  Te- 
Won’s  results.  Prom  the  listening  test  it  is  also  found  that 
the  separation  quality  is  better  for  the  proposed  method 
(see  in  http://members.tripod.com/zen76/index.htm  to  com¬ 
pare  results  as  well  as  other  examples).  Here  we  do  not 
compare  our  results  with  that  of  other  more  conventional 
methods,  since  it  is  found  that  Te- Won’s  method  works 
much  better  than  the  other  existing  methods  when  real- 
world  audio  signals  are  used. 

6.  DISCUSSION 

Separations  of  audio  signals  have  been  performed  for  var¬ 
ious  real-world  signals  using  an  efficient  ICA  based  BSS 
method.  Using  feedback  network  within  non-causal  filters 
it  is  successful  to  reduce  the  length  of  the  unmixing  fil¬ 
ters.  Satisfactory  results  are  obtained  from  both  the  sub¬ 
jective  (listening)  and  objective  (cross-correlation)  perfor¬ 
mance  tests,  which  overcome  the  results  shown  in[7].  The 
length  of  separation  filters  and  the  required  number  of  iter¬ 
ations  may  depend  on  the  amount  of  cross-correlation  be¬ 
tween  the  recorded  signals. 
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Figure  1:  Separation  of  two  music  signals  (between  0s-.091s);  (a)-(b)  Original  source  signals,  (c)-(d)  Mixed  signals,  (e)-(f) 
Separated  signals. 
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Figure  2:  Performance  measure  using  cross-correlation;  (a)  Cross-correlation  between  mixed  signals  in  Figs.  l(c)-(d),  (b) 
Cross-correlation  between  separated  signals  in  Figs.  l(e)-(f). 
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Figure  3:  Separation  of  two  speech  signals  (between  0.3125s-0.4375s);  (a)-(b)  Mixed  signals,  (c)  -(d)  Separated  signals. 


Figure  4:  Cross-correlation  between  the  separated  signals  for  the  presented  method  (solid  line)  and  Te- Won’s  method  [7] 
(dashed-dot  line)  for  the  results  shown  in  Figs.  l(e)-(f)  and  http://www.cnl.salk.edu/~tewon/blind.html. 
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ABSTRACT 

This  paper  investigates  a  novel  closed-form  estimation  class,  so- 
called  weighted  estimator  (WE),  for  blind  source  separation  in  the 
basic  two-signal  problem.  Proper  combination  of  previously  pro¬ 
posed  estimators  yields  consistent  estimates  of  the  separation  para¬ 
meters  under  general  conditions.  In  the  real-mixture  case,  we  de¬ 
termine  analytic  expressions  for  the  WE  asymptotic  (large-sample) 
variance  and  the  source -dependent  weight  value  of  the  most  effi¬ 
cient  estimator  in  the  class.  By  means  of  the  bicomplex-number 
formalism,  the  WE  is  extended  to  the  complex-mixture  scenario, 
for  which  Cramer-Rao  bounds  are  also  derived.  Simulations  com¬ 
pare  the  WE  with  other  methods,  demonstrating  its  potential. 

Keywords:  blind  source  separation,  estimation  theory,  higher-order  stat¬ 
istics,  non-Gaussian  signal  processing,  sensor  array  processing. 


1.  INTRODUCTION 

The  problem  of  blind  source  separation  (BSS)  arises  in  a  great 
variety  of  applications,  in  fields  as  diverse  as  wireless  commu¬ 
nications,  seismic  exploration  and  biomedical  signal  processing. 
BSS  aims  to  reconstruct  an  unknown  set  of  q  mutually  independ¬ 
ent  source  signals  x  €  Cq  which  appear  mixed  at  the  output  of  a 
p-sensor  array  y  €  Cp,  p  Js  q.  In  the  noiseless  instantaneous  lin¬ 
ear  case,  sources  and  observations  are  linked  through  an  unknown 
mixing  transformation  M  €  Cpx<!: 

y  =  Mx.  (1) 

The  problem  consists  of  estimating  the  source  vector  x  and  the 
mixing  matrix  M  from  the  exclusive  knowledge  of  sensor  vec¬ 
tor  y.  Neither  the  ordering  nor  the  power  and  phase-shift  of  the 
sources  can  be  identified  in  the  model  above,  so  we  may  assume, 
with  no  loss  of  generality,  an  identity  source  covariance  matrix. 

When  the  time  structure  of  the  signals  cannot  be  exploited 
(e.g.,  due  to  the  source  spectral  whiteness),  one  needs  to  resort 
to  higher-order  statistics  (HOS)  [1],  The  success  of  the  separation 
then  relies  on  the  non-Gaussian  nature  of  the  sources.  A  previ¬ 
ous  spatial  whitening  process  (entailing  second-order  decorrela¬ 
tion  and  power  normalization)  helps  to  reduce  the  number  of  un¬ 
knowns,  resulting  in  a  set  of  normalized  uncorrelated  components 
z  E  Cq: 

z  =  Qx,  (2) 

with  Q  €  Cqxq  unitary.  As  the  general  scenario  p  >  2  can  be 
tackled  through  an  iterative  approach  over  the  signal  pairs  [2],  the 

Vicente  Zarzoso  would  like  to  thank  the  Royal  Academy  of  Engineer¬ 
ing  for  supporting  this  work  through  the  award  of  a  Post-doctoral  Research 
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two-signal  case,  p  =  q  =  2,  is  of  fundamental  importance.  The 
unitary  transformation  Q  is  then  a  complex  elementary  Givens  ro¬ 
tation  matrix: 

n_  cos  9  —e~3nsm9  , 

^  e,a  sin  9  cos  6 


Hence,  the  source-signal  extraction  and  mixing-matrix  identifica¬ 
tion  reduce  to  the  estimation  of  angular  parameters  6,  a  €  R. 

In  the  real-valued  mixture  case,  a  =  0  and  only  9  is  unknown. 
The  performance  of  the  first  closed-form  solution  for  the  estim¬ 
ation  of  9,  based  on  the  output  4th-order  cross-cumulant  nulling 
[3],  was  later  shown  to  depend  on  9  itself  [4,  5].  The  maximum- 
likelihood  (ML)  approach  on  the  Gram-Charlier  expansion  of  the 
source  probability  density  function  (pdf)  produced  the  solution 
of  [6],  whose  validity  was  broadened  through  the  extended  ML 
(EML)  and  the  alternative  EML  (AEML)  estimators  [4, 7,  8],  Such 
estimators  lose  their  consistency  for  zero  source  kurtosis  sum  (sks) 
and  source  kurtosis  difference  (skd).  respectively.  This  deficiency 
was  overcome  in  [8]  and  [9].  In  the  latter,  adopting  the  framework 
of  [6]  the  two  estimators  were  joined  into  a  single  analytic  expres¬ 
sion.  the  approximate  ML  (AML).  The  MaSSFOC  estimator  [10], 
derived  from  the  approximate  maximization  of  a  contrast  function 
made  up  of  the  sum  of  output  squared  kurtosis  [2],  exhibits  a  strik¬ 
ingly  resembling  form.  The  notion  of  linearly  combining  estima¬ 
tion  expressions  using  arbitrary  weights  was  originally  put  forward 
in  [9],  giving  rise  to  the  so-called  weighted  AML  (WAML)  estim¬ 
ator.  It  was  suggested  that  the  weight  parameter  could  be  adjusted 
by  taking  advantage  of  a  priori  information  on  the  source  pdfs,  al¬ 
though  no  specific  guidelines  were  given  on  how  the  actual  choice 
should  be  made. 

The  present  contribution  fills  this  gap  by  studying  in  finer  de¬ 
tail  this  weighted  estimator  (WE)  for  BSS  and  emphasizing  its 
potential  benefits.  In  the  real-mixture  case,  we  capitalize  on  the 
complex-centroid  notation  used  in  the  EML  and  AEML  estimators 
in  order  to  provide  an  analytic  formula  for  the  WE  large-sample 
variance.  From  this  formula,  the  weight  parameter  of  the  asymp¬ 
totically  most  efficient  WE  is  obtained  as  a  function  of  the  source 
statistics.  In  addition,  the  WE  is  neatly  extended  to  the  complex¬ 
valued  mixture  case  with  the  bicomplex  number  formalism  de¬ 
veloped  in  [4. 1 1].  We  deduce  Cramer-Rao  lower  bounds  (CRLBs) 
for  the  pertinent  parameters,  and  show  in  simulations  that  the  WE 
is  able  to  follow  the  CRLB  trend  of  an  objective  separation-quality 
performance  index.  The  connections  between  the  WE  and  other 
analytic  solutions  are  also  highlighted  throughout  the  paper. 

First,  we  summarize  a  few  mathematical  notations.  Symbol 
H%n  =  E[a™x?],  where  E[-]  denotes  the  mathematical  expect¬ 
ation.  stands  for  the  (m  +  n)th-order  moment  of  the  source  sig¬ 
nals  x  =  (xi ,  X2).  For  convenience,  the  cumulants  of  com¬ 
plex  vector  z  =  (zi,  . ..,  zq)  are  defined  as  Cumfli2i3,..  = 
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Cumfz^ ,  Zj2,  z-3,  ...],  1  <  ik  <  q ,  with  the  convention,  in  the 
two-component  case,  /c*_rr  =  CumLj  2...2-  We  also  define 

n—r  r 

7  =  «Io  +  «04  (sks)  and  t]  =  k%0  -  «04  (skd).  Symbol  Za 
represents  the  principal  value  of  the  argument  of  a  G  C. 

2.  REAL-MIXTURE  CASE 
2.1.  Fourth-Order  Weighted  Estimator 

The  WAML  estimator  [9]  accepts  a  more  convenient  formulation 
when  adopting  the  EML/AEML  approach  [4,  5,  7,  8],  which  is 
based  on  the  polar  representation  of  real-valued  bivariate  random 
vector  z  =  {zi,  z2)  as  pej4>  =  zi  +  jz2,  j  =  \/^l.  Higher- 
order  expectations  then  generate  complex-valued  linear  combin¬ 
ations  ( centroids )  of  the  whitened-sensor  statistics  which  lead  to 
explicit  estimation  expressions  for  the  parameter  of  interest.  Ac¬ 
cordingly,  the  EML  is  expressed  as 

4ml  =  5/(714),  (4) 

where  £4  is  the  4th-order  complex  centroid: 

£4  =  E  [p4e-'4</>]  =  (kJ0  +  ko4  —  6/C22)  +  j’4(k5i  —  K13),  (5) 

and  the  sks  can  be  estimated  from  the  array  output  through  7  = 
E [p4]  —  8  =  K.40  +  +  2k|2.  Similarly,  the  AEML  [4,  8]  reads: 

0AEML  =  5/^2,  (6) 

£2  =  E [p  e 7  =  (kJo  —  K04)  +i2(«51  -f  K43).  (7) 


Fig.  1.  ISR  vs.  sample  size.  Uniform-Rayleigh  sources,  9  =  15°, 
v  independent  Monte  Carlo  runs,  with  vT  —  5  x  106.  Solid  lines: 
average  empirical  values.  Dashed  lines:  asymptotic  variances  (10). 


2.3.  Optimal  Large-Sample  Performance 

If  l«4ol  /  |«04l*  derivative  of  eqn.  (10)  with  respect  to  w 
cancels  at: 


Under  mild  conditions  [4, 7],  centroids  £4  and  £>  are  consistent  es¬ 
timators  of  'ye’40  and  r/e120,  respectively,  so  that  0EML  and  9A EMl 
consistently  estimate  9  as  long  as  7  /  0  and  7  ^  0,  respectively. 
It  follows  that 

0\\'z  =  j/^we,  with  (8) 

(we  =  W7^4  +  (1  -  w)(,2,  0  <  w  <  1.  (9) 

is  a  consistent  estimator  of  8  for  any  source  distribution  (besides 
when  the  sources  are  both  Gaussian).  Eqn.  (8)  is  essentially  the 
WAML  estimator  [9]  written  in  centroid  form.  Nonetheless,  we 
adhere  to  the  more  general  denomination  of  weighted  estimator 
(WE),  since  its  ML  nature  becomes  unclear  when  extended  to  the 
complex-signal  domain  (Section  3). 

Some  special  cases  of  the  WE  are: 


(i) 

w  =  0: 

AEML  estimator  of  [4,  8]. 

(ii) 

w  =  1/3: 

AML  estimator  of  [9], 

(in) 

w  =  1/2: 

MaSSFOC  estimator  of  [10]. 

(iv) 

w  =  1: 

EML  estimator  of  [4,  7], 

2.2.  Performance  Analysis 

Along  the  lines  of  [4, 5],  and  omitting  tedious  algebraic  details,  the 
asymptotic  (large-sample)  variance  of  the  WE  (8)  is  determined  as: 

2  E  j  [w'f(xlx2  -  x-rxl)  +  (1  -  w)p(x\x2  +  xix\ )]2| 
^WE  T[w72  +  (1  —  w)i 72] 2 

(10) 

where  T  is  the  number  of  samples.  Remark  that: 

(i)  ctJwe  reduces  to  the  asymptotic  variance  of  the  AEML  and 
EML  estimators  [4,  5]  for  w  =  0  and  w  =  1,  respectively. 

(ii)  When  7  =  0  (resp.  r;  =  0),  WE  performance  reduces  to 
that  of  the  AEML  (resp.  EML)  estimator,  for  any  0  <  w  <  1. 


w  _  1,  M 40^04  [(^4o)~  ~  (^Qj)2]  +  ^40^04(^60  ~  (*06) 

2  2[(«40)2M06  -  («04)Vio] 

(ID 

Since  d2(o2^^)/dw2\w^t  >  0,  wopt  corresponds  to  the  min¬ 
imum  variance  estimator  of  the  WE  family.  Hence,  given  the 
source  statistics,  one  can  select  the  WE  with  optimal  asymptotic 
performance.  If  wopt  [0, 1],  we  choose  between  wopt  —  0 
(AEML)  and  wopt  =  1  (EML)  the  value  that  gives  the  lowest 

<7IwEil,(10)- 

2.4.  Simulation  Results 

A  few  simulations  illustrate  the  benefits  of  the  WE  and  show  the 
goodness  of  asymptotic  approximation  (10).  First,  observe  that 
any  angle  estimate  of  the  form  9  =  0  +  mr/2,  n  6  Z,  provides 
a  valid  separation  solution  up  to  the  indeterminacies  mentioned  in 
Sec.  1.  The  interference-to-signal  ratio  (ISR)  performance  index 
[1]  approximates  the  variance  of  9,  crj,  around  any  valid  separa¬ 
tion  solution  [4],  The  ISR  is  an  objective  measure  of  separation 
performace,  for  it  is  method  independent. 

Fig.  1  shows  the  ISR  results  obtained  by  the  EML,  AEML, 
AML,  MaSSFOC  and  optimal  WE,  together  with  the  expected 
asymptotic  variances,  for  varying  sample  size  and  i.i.d.  sources 
with  uniform  and  Rayleigh  distributions  [wopt  =  0.7141,  from 
eqn.  (11)].  Centroids  are  computed  from  their  polar  forms.  The 
optimal  WE  substantially  outperforms  the  other  estimators,  being, 
e.g.,  five  and  ten  times  as  efficient  [12]  as  the  AML  and  the  AEML, 
respectively.  The  fitness  of  asymptotic  approximation  (10)  is  very 
precise  in  all  cases. 

The  generalized  Gaussian  distribution  (GGD)  with  shape  para¬ 
meter  A,  p(  x)  oc  exp(—  x|A),  is  used  as  source  pdf  in  the  simula¬ 
tion  of  Fig.  2.  We  fix  /tg 4  =  0.5  and  smoothly  vary  k%0  to  generate 
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Fig.  2.  ISR  vs.  sks  7  and  skd  rj.  GGD  sources,  Kq4  =  0.5,  Fig.  3.  Optimal  value  of  the  WE  weight  parameter  in  the  separa¬ 
te  =  15°,  T  —  5  x  103  samples,  103  Monte  Carlo  runs.  tion  scenario  of  Fig.  2. 


a  range  of  sks  and  skd  values.  The  optimal  WE,  with  wop,  calcu¬ 
lated  as  in  Sec.  2.3  and  shown  in  Fig.  3,  is  compared  with  other 
analytic  solutions  and  the  CRLB  obtained  in  [9]  for  the  real  case. 
The  optimal  WE  follows  the  CRLB  more  closely  than  any  of  the 
other  methods. 

3.  COMPLEX-MIXTURE  CASE 

3.1.  Bicomplex  Numbers 

In  [4,  1 1],  the  so-called  bicomplex  numbers  prove  useful  in  simpli¬ 
fying  the  development  of  closed-form  estimators  in  the  complex- 
mixture  scenario.  Given  a  unitary  matrix  Q  =  [  £  b  €  C, 

where  *  denotes  complex  conjugation,  the  associated  bicomplex 
number  is  defined  as  x  =  a  +  j/b.  Though  analogous  to  j,  the 
bimaginary  unit  j)  is  actually  a  distinct  algebraic  element.  Terms 
a  =  Re(x)  and  b  =  Im(x)  are  the  treat  and  bimaginary  parts 
of  x,  respectively.  The  product  of  two  bicomplex  numbers  27  = 
01  +  j)b\  and  $2  =  a2  +  jfbn  is  defined  in  accordance  with  the 
product  of  unitary  transformations: 

X1X2  =  (a  1O2  —  (762)  +i(6ia2  +  aj62).  (12) 

In  this  manner,  an  isomorphism  is  created  between  the  set  of  unit¬ 
ary  matrices  under  usual  matrix  product  and  the  set  of  bicomplex 
numbers  under  the  above  product  operation.  Note  that,  as  with  j, 
jf  =  —1.  A  special  class  of  bicomplex  numbers  arises  when  the 
associated  unitary  transformation  shows  the  shape  of  (3): 

ej®  =  cos  9  +  je2°sin  9,  (13) 

which  we  call  bicomplex  exponential. 

3.2.  Fourth-order  Weighted  Estimator 

By  means  of  the  bicomplex  formalism,  one  can  easily  generalize 
centroids  (5)  and  (7)  to  the  complex-mixture  case.  Effectively, 

£■4  =  («40  +  k04  —  6K22)  +  j4(/t3i  —  K13)  (14) 


and 

£2  =  («4o  —  Kor)  +  j2(«3i  +  K73)  (15) 

are  consistent  estimators  of  7e^40  and  r?e^20.  respectively,  under 
the  same  general  conditions  as  in  the  real  case.  Centroid  ( 14)  gives 
rise  to  the  complex  EML  (CEML)  estimator  [4,  1 1],  whereas  (15) 
yields  the  complex  AEML  (CAEML)  estimator  [4].  Bearing  in 
mind  the  bicomplex  product  (12).  it  follows  immediately  that  the 
linear  combination 

IcWE  =  W-fl 1  +  (1  -  w)62  (16) 

consistently  estimates  (wy2  +  (1  -  w)ij2)ef^n.  The  sks  7  may 
be  obtained  from  the  available  data  just  as  in  the  real  case.  For 
w  €  [0,  1],  parameters  (6,  a)  are  estimated  through 

4<?cwe  =  Z(Re(|c\vE)  +  j|Hm(|cwE)|)  .17, 
ctcwE  =  4Iid(^cwe), 

which  is  the  complex  WE  (CWE). 

3.3.  Cramer-Rao  Lower  Bounds 

Assuming  circularly  distributed  source  signals  composed  of  T  in¬ 
dependent  samples,  the  Fisher  information  matrix  (FIM)  for  the 
estimation  of  parameters  (9,  a)  in  model  (2>— (3)  reads: 


and  pi,  (u,  v)  is  the  pdf  of  the  A-th  source  signal  Xk  =  Uk  +  jvk . 
uk,  Vi;  G  R,  A-  =  1,  2.  Integration  extends  over  the  definition 
domain  Df.  of  the  corresponding  random  variable. 
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It  is  interesting  to  note  that: 

(i)  The  CRLBs  of  9  and  a  are  decoupled,  and  therefore: 

CRLBe  =  {Tiy1  (20) 

CRLBa  =  4(T/sin220)~1  (21) 

(ii)  For  sources  with  complex  generalized  Gaussian  distribu¬ 
tion  (CGGD)  of  shape  parameter  A,  given  by 

p(u,  v)  oc  exp{— (u2  +  v2y  },  A  >  0,  (22) 

we  have 

h  =  ±Alr(4/Afc)/r2(2/Afe).  (23) 

Then,  the  FIM  is  zero,  and  hence  the  model  unidentifiable,  iff  Ai  = 
A2  =  2,  i.e.,  both  sources  are  Gaussian. 

(iii)  When  9  =  nn/2 ,  Vn  €  Z,  estimation  of  q  becomes 
unfeasible.  However,  in  such  cases  the  correct  estimation  of  a 
does  not  affect  the  source  extraction,  e.g.,  if  6  =  0,  Q  in  (3)  is 
just  an  identity  matrix;  if  0  =  n/2,  Q  only  contains  off-diagonal 
phase  factors  which  are  ‘absorbed’  by  the  source  signals. 

(iv)  Endorsing  the  previous  point  we  have  that,  for  accurate 

estimates  of  (6,  a),  ISR  «  cr|  +  sin220,  so  that  ISR  is  lower 

bounded  by  2  x  CRLBs.  When  8  =  mr/2,  n  €  Z,  and  if  6  is 
still  precise  enough,  this  bound  decreases  to  CRLBs.  That  is,  the 
lower  bound  of  separation-performance  objective  measure  ISR  is 
independent  of  9  and  is  (asymptotically)  determined  by  the  source 
statistics  only  [via  I  in  (19)]. 

3.4.  Simulation  Results 

A  simple  simulation  experiment  compares  the  behaviour  of  the 
CEML,  CAEML  and  CWE  (with  w  =  1/3  and  w  =  1/2  ,  which 
would  correspond  to  the  complex  extensions  of  AML  and  MaSS- 
FOC,  resp.).  Two  independent  CGGDs  are  used  as  sources.  Aver¬ 
age  ISR  results  as  a  function  of  sks  and  skd  are  displayed  in  Fig.  4. 
As  expected,  the  CEML  and  CAEML  worsen  near  7  =  0  and 
77  =  0,  respectively.  By  contrast,  the  CWE  maintains  a  satisfact¬ 
ory  separation  in  both  tested  cases  over  all  7  and  77  range,  and,  as 
occurred  in  the  real  case  (Fig.  2),  its  performance  follows  closely 
the  CRLB  trend. 

4.  CONCLUSIONS  AND  OUTLOOK 

A  new  class  of  closed-form  estimators  of  the  separation  parameters 
in  the  fundamental  two-signal  instantaneous  linear  mixture  BSS 
problem  has  been  investigated.  A  weighted  estimator  (WE)  arises 
from  the  linear  combination  of  the  EML  and  AEML  centroids, 
and  produces  consistent  estimates  under  rather  general  conditions 
(essentially,  if  at  most  one  source  is  Gaussian).  For  real-valued 
mixtures,  prior  knowledge  on  the  source  statistics  can  be  exploited 
by  selecting  the  WE  with  optimal  large-sample  performance  (min¬ 
imum  asymptotic  variance).  With  the  aid  of  the  bicomplex  num¬ 
bers  the  WE  has  also  been  extended  to  the  complex-mixture  case, 
where  it  has  shown  a  performance  variation  similar  to  the  CRLB, 
that  we  have  derived  for  circular  sources. 

Paths  of  further  research  include  the  asymptotic  performance 
analysis  of  the  WE  in  the  complex  environment,  which  is  of  rel¬ 
evance  in  areas  as  important  as  digital  communications.  Also,  in 
order  to  enable  a  fully  blind  operation,  it  is  necessary  to  develop 
the  optimal  weight  coefficient  as  a  function  of  the  array-output 
statistics.  The  estimator’s  behaviour  in  the  presence  of  additive 
noise  and  impulsive  interference  needs  to  be  explored  as  well. 


Fig.  4.  ISR  vs.  sks  7  and  skd  77.  CGGD  sources,  /tg4  =  0.5, 
9  =  15°,  q  =  65°,  T  =  5x  103  samples,  103  independent  Monte 
Carlo  iterations. 
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The  “Algebraic  Constant  Modulus  Algorithm”  (ACMA)  is  a  non¬ 
iterative  block  algorithm  for  blind  separation  of  constant  modulus 
sources.  We  previously  showed  that,  unlike  CMA,  it  asymptoti¬ 
cally  converges  to  the  (non-blind)  Wiener  receiver.  In  this  paper, 
we  present  a  finite  sample  statistical  performance  analysis.  This 
can  be  used  to  predict  the  SINR  performance,  as  well  as  the  de¬ 
viation  from  the  Wiener  receivers.  The  theoretical  performance  is 
illustrated  by  numerical  simulations  and  shows  a  good  match. 

1.  INTRODUCTION 

In  this  paper  we  study  the  performance  of  ACMA  (“Analytical 
Constant  Modulus  Algorithm”),  proposed  in  [1],  ACMA  is  a  non¬ 
recursive  blind  source  separation  algorithm  for  constant  modulus 
signals.  It  is  a  batch  algorithm  that  under  noise-free  conditions  can 
compute  exact  separating  beamformers  for  all  sources  at  the  same 
time,  using  only  a  small  number  of  samples.  Although  it  has  been 
derived  as  a  deterministic  method,  it  is  closely  related  to  JADE  and 
other  fourth-order  statistics  based  source  separation  techniques. 

We  could  recently  show  that  (unlike  CMA),  ACMA  beam- 
formers  converge  asymptotically  in  the  number  of  samples  to  the 
(non-blind)  Wiener  receivers  [2],  Here,  we  will  extend  the  anal¬ 
ysis  by  deriving  the  large  finite  sample  performance  of  a  block  of 
N  samples.  For  this  we  need  the  statistics  of  the  eigenvectors  of  a 
fourth  order  covariance  matrix  with  non-Gaussian  sources. 

2.  DATA  MODEL 

We  consider  a  linear  data  model  of  the  form 

xk-Ask+nk,  (1) 

where  x^.  e  €  M  is  the  data  vector  received  by  an  array  of  M  sen¬ 
sors  at  time  k,  e  C  d  is  the  source  vector  at  time  k,  and  e 
C  M  an  additive  noise  vector.  A  =  [aj  •••  aj]  represents  an  Mxd 
complex-valued  instantaneous  mixing  matrix  (or  array  response 
matrix).  The  sources  are  constant  modulus  (CM),  i.e.  each  entry 
Sj  of  s  satisfies  |.s,j  =  1. 

We  collect  N  samples  in  a  matrix  X  =  [xj ,  ■  •  - ,  xjv]  :  M  x  N. 
Similarly  defining  S  :  dxN  and  N  :  MxN,  we  obtain 

X  =  AS  +  N .  (2) 

A,  S  and  N  are  unknown.  The  objective  is  to  reconstruct  S  us¬ 
ing  linear  beamforming,  i.e.,  to  find  a  beamforming  matrix  W  = 
[W| ,  •  •  • ,  wfi  e  C  Mxd  of  full  row  rank  d  such  that  S  =  WHX  ap¬ 
proximates  S,  Since  S  is  unknown,  the  criterion  for  this  is  that  S 
should  be  as  close  to  a  CM  matrix  as  possible,  i.e.,  we  aim  to  make 
|S,vt|  =  |w"x;.|  =  1  \/i,k.  If  this  is  the  case,  then  S  is  equal  to  S  up 
to  unknown  permutations  and  unit-norm  scalings  of  its  rows.  With 
noise,  we  can  obviously  recover  the  sources  only  approximatively. 

We  work  under  the  following  assumptions: 


1.  N>d2.  A  has  full  rank  d,  and  M>d.  To  avoid  complica¬ 
tions  in  the  analysis,  we  assume  M  —  d. 

2.  The  sources  are  statistically  independent  constant  modu¬ 
lus  sources,  circularly  symmetric,  with  covariance  Rx  := 
E(ssh)  =  I. 

3.  The  noise  is  additive  white  Gaussian,  zero  mean,  circularly 
symmetric,  independent  from  the  sources,  with  covariance 
R„  :=  E(nnH)  =  o~l. 

Notation  Overbar  (')  denotes  complex  conjugation, 1  is  the  ma¬ 
trix  transpose. H  the  matrix  complex  conjugate  transpose,  t  the  ma¬ 
trix  pseudo-inverse  (Moore-Penrose  inverse).  I  (or  I/; )  is  the  (px p) 
identity  matrix;  e,  is  its  r-th  column.  0  and  1  are  vectors  with  all 
entries  equal  to  0  and  1,  respectively.  vec(A)  is  a  stacking  of  the 
columns  of  a  matrix  A  into  a  vector.  For  a  vector,  diag(v)  is  a  di¬ 
agonal  matrix  with  the  entries  of  v  on  the  diagonal.  0  is  the  Schur- 
Hadamart  (entry-wise)  matrix  product,  ®  is  the  Kronecker  prod¬ 
uct.  o  is  the  Khatri-Rao  product,  which  is  a  column-wise  Kronecker 
product.  E(  )  denotes  the  expectation  operator. 

For  a  matrix-valued  stochastic  variable  R.  define  its  covariance 
matrix  cov{R)  =  E{[vec(R-E(R))][vec(R-E(R))]"} . 

For  a  zero  mean  random  vector  x  =  [.v,],  define  the  fourth  order 
cumulant  matrix 

Kx  =  E(x®x)(x®x)H-E(x®x)E(x®x)"-E(xxH)®E(xxH) 
-E(x®  1)(1®x)h  ©E(l®x)(x®  1)H. 

For  circularly  symmetric  variables,  the  last  term  vanishes. 

3.  FORMULATION  OF  THE  ALGORITHM 

In  brief  outline,  ACMA  consists  of  two  main  steps:  a  prewhiten¬ 
ing  operation,  and  the  algorithm  proper.  Define  the  data  covariance 
matrix  and  its  sample  estimate 

Rx  :=  E{xxh}  ,  Rx  :=  ^  £x*x"  • 

Assuming  that  M  —  d  for  simplicity  of  the  analysis,  the  prewhiten¬ 
ing  filter  transforms  the  data  to 

X Rx1/2X  =:  AS  +  N 

where  the  underscore  indicates  the  prewhitening.  Note  that  Rx  —  I. 

Given  the  N  data  samples  [x*],  the  purpose  of  a  beamforming 
vector  w  is  to  recover  one  of  the  sources  as  ,fj.  =  wHx^..  One  tech¬ 
nique  for  estimating  such  a  beamformer  is  by  minimizing  the  deter¬ 
ministic  CMA(2,2)  cost  function,  w  =  argminw  ^  X(|w"x/j2~  l)2 . 
Define 
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In  [2],  we  have  derived  that  CMA(2,2)  is  equivalent  to  (up  to  a  scal¬ 
ing  of  w  which  is  not  of  interest  to  its  performance) 

w  =  Rx'^t ,  t  =  argmin  yHCxy ,  (3) 

y  =  t®t 

lly||  =  i 

ACMA  is  obtained  as  a  two-step  approach  to  the  latter  minimiza¬ 
tion  problem  [2]: 

1 .  Find  an  orthonormal  basis  Y  =  [y  i ,  •  •  • ,  yj]  of  independent 
minimizers  of  yHCxy,  i.e.,  the  eigenvectors  corresponding 
to  the  d  smallest  eigenvalues  of  Cx. 

2.  Find  a  basis  {ti  0  ti ,  •  -  • ,  t</  ®  t</  }  that  spans  the  same  linear 
subspace  as  {yi,  •••,&/},  and  with  ||t,|[  =  1,  i.e.,  solve 

T  =  min||Y-(ToT)M||2,  (4) 

subject  to  the  constraint  diag(THT)  =  I. 

It  was  shown  in  [2]  that  T  converges  asymptotically  in  A  to  a 
matrix  T  =  A0,  where  A0  is  equal  to  A  except  for  a  scaling  and 
permutation  of  its  columns.  In  the  non-whitened  domain,  W  = 
Rx  /_T  converges  asymptotically  toW  =  R~*  Aq,  the  Wiener  re¬ 
ceiver  (except  for  the  scaling  and  the  permutation). 

A  performance  analysis  is  now  possible,  and  follows  in  outline 
the  analysis  of  the  MUSIC  and  WSF  DOA  estimators  [3],  but  ex¬ 
tended  to  fourth  order  statistics  of  non-Gaussian  sources.  The  fol¬ 
lowing  limitations  are  introduced  to  keep  the  derivations  tractable. 

1 .  N  is  sufficiently  large,  and  we  neglect  terms  of  order  N~ 2 
over  terms  of  order  AT1 .  The  noise  power  a2  is  sufficiently 
small  and  we  neglect  cs4  over  a2. 

2.  We  assume  that  the  prewhitening  step  is  based  on  the  true 
covariance  matrix  Rx.  (This  is  accurate  for  M  —  d.) 

3.  We  assume  that  the  exact  solution  to  (4)  is  computed. 

4.  COVARIANCE  OF  Cx 

In  this  and  the  next  sections,  we  drop  for  convenience  the  under¬ 
score  from  the  notation  since  all  variables  are  based  on  whitened 
data.  Our  objective  in  this  section  is  to  find  a  compact  approxima¬ 
tive  expression  for  the  covariance  of  Cx,  denoted  by  Gx.  Define 

Cx  =  E{(xk®xk)(xk®xk)H}-E{xk®xk  }E{x/-  ®xk}H . 

Using  properties  of  cumulants,  we  can  show  that  [2] 

Cx  =-[AoA][Ao  A]h  +  Rx®Rx  =  — [A  o  A]  [A  o  A]h  + 1 .  (5) 

Furthermore,  a  straightforward  derivation  shows  that 

cov{Rx}  =  ^Cx .  (6) 

Thus,  Cx  is  the  covariance  of  Rx,  and  Cx  is  a  (biased)  sample  esti¬ 
mate  of  it.  A  second  interpretation  of  Cx  is  obtained  by  defining  a 
“data”  sequence 

gk:=xk®xk-E{xk®xk},  k=\ ,  —  ,N,  (7) 

and  considering  its  covariance  and  sample  covariance 

Rg:=E{g,g”},  ftg:=£Sftgg. 

It  is  straightforward  to  show  that 

E{Rg}  =  Rg  =  Cx,  Rg  =  Cx(l  +0(i)). 


Thus,  Cx  is  the  covariance  of  %k,  and  Rg  is  an  unbiased  sample  es¬ 
timate  of  it;  in  first  order  approximation  it  has  the  same  properties 
as  the  biased  estimate  Cx.  Similar  to  (6),  it  follows  that  cov{Rg}  = 
^Cg  where 

Cg  :=E{(g®g)(g©g)H}-E{g®g}E{g®g}H.  (8) 

In  summary,  we  can  prove 

Theorem  1.  Qx  :=  cov{Cx}  =  jjCg  +  0(^). 

It  remains  to  find  a  compact  description  of  Cg  in  terms  of  our  data 
model.  Inserting  the  model  xk  —  Ask  +  n/.  in  the  definition  of  g/., 
we  obtain 

g/:  =  Acca.  +  n, 

where 

C  .  £jjSjSj  -  [.V]  .92,  -  -  -  ,  .Vl  Sj .  S2S{,  SnSj ,  *  •  *] 

Ac  :=  [a;®a;],w 

n  :=  n®n-R„  +  As®n  +  ii®As 

where  e';  =  vec'(e,e”),  and  vec'(  • )  is  a  vectoring  operator  which 
skips  the  main  diagonal.  The  vector  c  is  CM  (with  certain  depen¬ 
dencies  among  its  entries).  Likewise,  the  matrix  Ac  skips  the  a,-  ®  a,- 
columns  of  A®  A. 

The  model  g k  =  Acck  +  nk  has  several  properties  that  are  simi¬ 
lar  to  that  of  xk  —  Ask  +  nk.  However,  c  and  n  are  not  independent 
(only  uncorrelated),  not  circularly  symmetric,  and  K„  0.  A  good 
approximation  for  Cg  taking  into  account  all  terms  up  to  0(cr),  is 
given  as 

Theorem  2.  Cg  =  [AC®AC]K^[AC®  AC]H  +  Rg®Rg  +  E+EH 
where 

E  =  [A®Ri/2®Ac]E,[Ac®Ri/2®A]H 
+  [Ri/2  ®  A®  Ac]E->  [Ac  ®  A  ®  R„/2]h 
^  -  Kc  +  XX(e^)(4®e'/ 

Kc  =  -[X(eiy®ey)(e5j®ey)H  +  (e'jj®e'ij)(e'jj®e'jj)H 

'  +(e'.®e';)(e',.®e',.)H] 

Ei  =  XXXef 

+e^  ®  tj  ®  ld  ®  e”  ®  e'ki  +  e^  ®  e^  ®  \d  ®  e"  ®  e'7  ( 1  -  8Jk ) 

E2  =  X  X  X  e'ji  ld  ®  ej  ®  <4 

+e?J  ®  e"  ®  ld®  ek  ®  e'kj  +  e'”  ®  e"  ®  ld  ®  ek  ®  e'  ;(  1-6/J. 

(All  indices  range  over  1  Note,  the  latter  matrices  are  data 
independent  and  simply  collections  of  ‘1  ’  entries.) 

Proof  Omitted. 

It  can  be  shown  experimentally  that  the  term  Rg  ®  Rg  is  the 
dominant  term,  so  that 

Cg  —  Cx  ®  Cx  (9) 

is  a  good  approximation.  This  is  the  same  as  regarding  c  and  n  as 
Gaussian  vectors  with  independent  entries.  Making  this  approxi¬ 
mation  would  lead  to  particularly  simple  results  in  the  eigenvector 
perturbation  study  and  subsequent  steps,  as  we  basically  can  apply 
the  theory  in  Viberg  [3]. 
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5.  EIGENVECTOR  PERTURBATION 


6.  SUBSPACE  FITTING 


In  this  section  we  consider  the  statistical  properties  of  the  eigen¬ 
vectors  of  Cx,  a  fourth  order  sample  covariance  matrix  based  on 
nonGaussian  signals.  We  first  give  a  general  derivation  and  then 
specialize  to  the  case  at  hand.  The  generalization  is  needed  because 
most  existing  derivations  consider  Gaussian  sources. 

For  a  covariance  matrix  R  with  unbiased  sample  estimate  R 
based  on  N  samples  of  a  (not  necessarily  Gaussian)  vector  process, 
consider  the  eigenvalue  decompositions  R  =  UAUH,  R  =  UAU". 
If  we  elaborate  on  the  equality 

R-R  =  (U-U)AU"-R(U-U)U"  +  U(A-A)U" 

and  assume  that  we  partition  the  eigenvalue  decomposition  of  R  as 

R  =  UAU”  =  UjAyU”  -f  U„A„U”,  (10) 

where  the  eigenvalues  in  As  are  distinct  and  unequal  to  any  eigen¬ 
value  in  A,„  then  we  can  derive  directly  that  in  first  order 

vec(P„Ui)  =  [I®U„][As®I-I©A„rl[Us®U„]"vec(R-R) 


6.1.  Cost  function 

The  next  item  in  the  analysis  is  the  subspace  fitting  problem  in  (4). 
We  can  follow  in  outline  the  performance  analysis  technique  de¬ 
scribed  in  [3].  Some  notational  changes  are  necessary. 

In  equation  (4).  we  computed  a  dxd  separating  beamforming 
matrix  T  (in  the  whitened  domain),  with  columns  constrained  to 
have  unit  norm.  W.l.o.g.,  we  can  further  constrain  the  first  nonzero 
entry  of  each  column  to  be  positive  real.  Let  A(0)  be  a  minimal 
parametrization  of  such  matrices.  The  true  mixing  matrix  can  then 
be  written  as  A  =  A(0o)B,  where  B  is  a  diagonal  scaling  matrix 
which  is  unidentifiable  by  the  subspace  fitting.  We  assume  that  the 
true  parameter  vector  0U  is  uniquely  identifiable  and  that  A(0)  is 
continuously  differentiable  around  0().  We  proved  in  [2]  that  as 
N  — >  T  converges  to  Ao  =  A(0O),  and  thus  we  can  write  T  = 
A(0).  In  this  notation,  equation  (4)  becomes 

A(0)  =  argmin||UJ-A(0)M||f. ,  A(0)  :=A(0)°A(0). 

A(«),M 


where  P„  =  U„U”.  From  the  latter  we  can  immediately  find  an  ex¬ 
pression  for  the  covariance  of  the  “signal”  eigenvectors  projected 
into  the  “noise”  subspace: 

Lemma  3.  Let  R  be  a  sample  covariance  matrix  converging  to 
R,  and  assume  that  R  has  eigenvalue  decomposition  (10)  where  the 
entries  in  As  are  distinct  and  unequal  to  any  entry  in  A„.  Then 

cov{P„U.s}  =  [I®Un][Ai®I-I®AH]_l[UJ®U„],,-cov{R}- 
•  [Uj  ®  U„]  [Aj  ®  I  - 1  ®  A,,]-1  [I  ®  U„]"  +  o(AH). 

(ID 

Essentially  the  same  result  appears  in  [4],  but  written  as  summa¬ 
tions  and  with  a  more  indirect  proof. 

We  now  specialize  to  our  situation.  We  have 

R  f4  Rg  =  C\ 

cov{R}  <-»  i2x  =  cov{Cx}  =  ^Cg  +  0(N~2) . 

Introduce  the  eigenvalue  decomposition  of  Cx  as 

Cx  =  UAU”  =  UvA5U''  +  y„A„y;;  (12) 

where  As  collects  the  d  smallest  eigenvalues  of  Cx.  Likewise,  Us 
is  a  basis  for  the  approximate  null  space  of  Cx.  Also  introduce  the 
singular  value  decomposition 

A  :=  AoA  =  yA?AYA,  (13) 


As  usual,  the  problem  is  separable,  and  the  optimum  for  M  given 
A(0)  is  AjO^Uy.  Eliminating  M.  we  obtain 

A(0)  =  argmin||Pjhe)ys||p 

A(0) 


where  P^(0j  =  I- A(0)A(0)t.  Hence  we  will  consider  the  mini¬ 
mization  of  the  cost  function 


m  =  l|PiW0*llF  =  v^(Pi(e)yt)Hvec(Pi(e)U.y)  (15) 


(This  can  be  generalized  to  a  weighted  norm  as  usual.) 

6.2.  Covariance  of  0 

Choose  a  specific  parametrization  of  A(0).  Since  the  columns  of 
A(0)  are  not  coupled,  we  can  write  A(0)  =  [a(0i),  ,  a(0,/)], 

where  a(0,)  is  a  parametrization  of  a  unit-norm  vector  with  real 
non-negative  first  entry',  which  requires  p  2(d -  1)  real-valued 
parameters  per  vector.  Denote  0y  the  i-th  parameter  of  0  ;,  and  de¬ 
fine  the  derivative  matrix 


D: 


da|  daj 
00 1 1  ’  002 1 


0a2 
00 1 2 


— ](0o). 


(16) 


where  Ua  has  d  orthonormal  columns,  Sa  —  diag[o/.]  is  a  d  xd  di¬ 
agonal  matrix,  and  Ya  is  dxd  unitary.  Let  U^  be  the  orthogonal 
complement  of  Ua-  It  follows  from  (5 )  that  the  eigenvalue  decom¬ 
position  of  Cx  is  given  by 


cx  =  [uA  Vi\ 


M  : 

i 

[Va  yil”. 

(14) 

Cu  : 
Q  : 

H  : 

Theorem  5.  Let  Ao  :=  A(0o)  o  A(0o),  Ac  A(0o)  ®  1^ 

D  =  AroD  +  DoAr 

-  (aJua)h®iJ, 

=  [?i2yA®?A]Cg[yA?A2®?A] 

=  4[M  o  P^D]"  Cu  [M  o  P^D] 

=  2[MoP^D]"  [MoP^D] , 


In  view  of  the  partitioning  in  (12)  we  set  Us  —  Ua.  As  =  I-?a- 
and  A„  =  I.  Inserting  this  in  ( 1 1),  we  obtain 

Theorem  4.  cov{Piys}  =  AfCu  +  o(N~'),  where 


where  Ua  and  T,\  are  defined  in  ( 13).  For  large  N,  the  covariance 
of&  that  minimizes  the  subspace  fitting  problem  (15)  is  in  first  order 
approximation 


Cu  :=  [?A2yA®Pi]Cg[UA?A2®PA]- 

Significant  simplifications  are  possible  if  we  allow  the  approx¬ 
imation  of  Cg  in  (9). 


R@  :=  cov{0}=  iH-'QH-1. 
Proof  Omitted:  along  the  lines  of  [3], 
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6.3.  Covariance  of  T 


It  remains  to  map  the  previous  result  to  an  expression  for  the  co- 
variance  of  the  beamforming  vectors.  With  some  abuse  of  notation, 
let  t  =  vec(T),  where  T  —  A(0q),  and  let  t  =  vec(T)  —  vec(A(§)). 
Then,  for  small  perturbations,  t  =  t  +  X,i  (0ti  -  9n ) ,  so  that  t  has 
covariance 


R, 


at 

laen 


at 

ae 


'21 


]Re 


at  at 

aey  30i  i 

I T  \  _  tyIH 


=  [(Irf  ®  Id)  0  D]  Re  [(!</  ®  ll)  o  D]“ , 


(17) 


where  D  was  defined  in  (16).  The  covariance  of  a  beamformer  t j 
is  the  7'7-th  subblock  of  size  pxp  of  R(. 

6.4.  SINR  performance 

To  allow  a  better  interpretation  of  the  performance  of  the  beam- 
formers,  we  derive  a  mapping  of  Rt  to  the  inverse  SINR,  or  the 
INSR  (interference  plus  noise  to  signal  ratio),  defined  for  a  beam¬ 
forming  vector  t  and  array  response  vector  a  of  the  corresponding 
source  as  (recall  that  Rx  =  I) 


INSR(t)  := 


tH(I-aaH)t 

tHaaHt 


The  optimal  solution  that  minimizes  the  INSR  is  t  =  cca  (for  an 
arbitrary  nonzero  scaling  a).  Consider  a  perturbation:  t  =  t  +  d 
where  t  =  oa.  Then 

INSR(t)  =  ^(1-aHa+^),  (18) 


where  the  approximation  is  good  if  dHPad  <  tHt.  Let  A  := 

be  a  normalized  (scale-invariant)  definition  of  the  covariance  of  t. 
Then  in  the  above  approximation 


E{INSR(t)}  ; 


l-aHa  tr(PjA) 
aHa  aHa 


(19) 


The  first  term  represents  the  asymptotic  performance  of  the  Wiener 
beamformer  (t  =  a  with  A  =  0).  The  second  term  is  the  excess 
INSR  due  to  the  deviation  of  t  from  the  optimum.  We  can  simply 
plug  in  the  estimates  of  Rty  from  equation  (17)  in  place  of  A  to  ob¬ 
tain  the  INSR  corresponding  to  the  ACMA  beamformers. 

For  comparison,  we  consider  the  Wiener  beamformer  esti¬ 
mated  from  finite  samples  and  known  S,  or  TV  =  (XXH)-1XSH. 
Let  t\v  be  one  of  the  columns  of  TV,  and  a  the  corresponding  col¬ 
umn  of  A.  The  normalized  covariance  of  tw  is  derived  as 


Aw  = 


cov(tw-a) 


1  l-aHa„  1 

1  + 


N  a  a 


so  that  for  the  expected  INSR  of  the  finite-sample  Wiener  we  find 
in  first  order  approximation 


E{INSR(%)}  = 

3  3 


d- 1  l-aHa 
N  '  (aHa)~  ' 


(20) 


7.  SIMULATIONS 

Figure  1  shows  performance  plots  of  the  first  source  for  a  simula¬ 
tion  with  d  =  3  sources,  M  —  3  antennas  in  a  uniform  linear  ar¬ 
ray,  source  powers  B  =  diag(l,  1.2, 0.9),  and  source  angles  a  = 
[0,a,-a],  for  varying  N  and  SNR.  The  figure  shows  the  excess 
INSR  relative  to  the  INSR  of  the  asymptotic  Wiener  beamformer. 


Figure  1.  Finite  sample  INSR  in  excess  of  the  asymptotic  INSR  of 
the  Wiener  beamformer. 

evaluated  for  source  1  (i.e.  the  second  terms  in  (19)  and  (20)). 
The  experimental  results  show  with  *+'  the  outcome  of  the  origi¬ 
nal  ACMA  algorithm  of  [1],  and  with  ‘o’  the  algorithm  as  analyzed 
here,  i.e.,  with  prewhitening  based  on  the  true  covariance  matrix 
Rx,  and  using  Gauss-Newton  optimization  to  solve  the  subspace 
fitting  step.  The  dotted  line  is  the  approximation  resulting  from  (9), 
which  is  indeed  very  good.  As  is  seen  from  the  figures,  the  theo¬ 
retical  curves  are  a  good  prediction  of  the  actual  performance  once 
N  >  30,  SNR  >  5  dB.  The  small  difference  in  performance  between 
the  original  algorithm  and  the  analyzed  algorithm  is  caused  by  the 
different  prewhitening.  Not  shown  in  the  figures  are  the  results  for 
weighted  subspace  fitting:  these  turned  out  to  be  virtually  identical 
to  the  unweighted  results. 
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ABSTRACT 

We  propose  a  new  algorithm  for  blind  source  separation  (BSS). 
in  which  independent  component  analysis  (ICA)  and  beamforming 
are  combined  to  resolve  the  low-convergence  problem  through  op¬ 
timization  in  ICA.  The  proposed  method  consists  of  the  following 
two  parts:  frequency-domain  ICA  with  direction-of-arrival  (DOA) 
estimation,  and  null  beamforming  based  on  the  estimated  DOA. 
The  alternation  of  learning  between  ICA  and  beamforming  can 
realize  fast-  and  high-convergence  optimization.  The  results  of 
the  signal  separation  experiments  reveal  that  the  signal  separation 
performance  of  the  proposed  algorithm  is  superior  to  that  of  the 
conventional  ICA-based  BSS  method. 

1.  INTRODUCTION 

Blind  source  separation  (BSS)  is  the  approach  taken  to  estimate 
original  source  signals  using  only  the  information  of  the  mixed 
signals  observed  in  each  input  channel.  This  technique  is  ap¬ 
plicable  to  the  realization  of  noise-robust  speech  recognition  and 
high-quality  hands-free  telecommunication  systems.  In  the  recent 
works  for  the  BSS  based  on  the  independent  component  analysis 
(ICA)  [1,2],  several  methods,  in  which  the  inverse  of  the  complex 
mixing  matrices  are  calculated  in  the  frequency  domain,  have  been 
proposed  to  deal  with  the  arrival  lags  among  each  of  the  elements 
of  the  microphone  array  system  [3, 4, 5].  However,  this  ICA-based 
approach  has  the  disadvantage  that  there  is  difficulty  with  the  low 
convergence  of  nonlinear  optimization  [6], 

In  this  paper,  we  describe  a  new  algorithm  for  BSS  in  which 
ICA  and  beamforming  are  combined.  The  proposed  method  con¬ 
sists  of  the  following  two  parts:  (1)  frequency-domain  ICA  with 
estimation  of  the  direction  of  arrival  (DOA)  of  the  sound  source, 
and  (2)  null  beamforming  based  on  the  estimated  DOA.  The  alter¬ 
nation  of  learning  between  ICA  and  null  beamforming  can  realize 
fast-  and  high-convergence  optimization.  The  following  sections 
describe  the  proposed  method  in  detail,  and  it  is  shown  that  the  sig¬ 
nal  separation  performance  of  the  proposed  algorithm  is  superior 
to  that  of  the  conventional  ICA-based  BSS  method. 

2.  DATA  MODEL  AND  CONVENTIONAL  BSS  METHOD 

In  this  study,  a  straight-line  array  is  assumed.  The  coordinates 
of  the  elements  are  designated  as  dk  (k  =  1,  •  •  • ,  K),  and  the 
directions  of  arrival  of  multiple  sound  sources  are  designated  as 
9i  (l  —  1,  •  •  • ,  L)  (see  Fig.  1),  where  we  deal  with  the  case  of 
K  —  L  —  2. 


Figure  1 :  Configuration  of  a  microphone  array  and  signals. 


In  general,  the  observed  signals  in  which  multiple  source  sig¬ 
nals  are  mixed  linearly  are  given  by  the  following  equation  in  the 
frequency  domain: 

X(f)  =  A(f)S(f),  (1) 

where  X{{)  is  the  observed  signal  vector,  S(f)  is  the  source  sig¬ 
nal  vector,  and  A(f)  is  the  mixing  matrix:  these  are  given  as 


x{{)  =  [Xi(/),---,Ak(/)]t, 

(2) 

S(f)  =  [5,(/),---,Sl(/)]t, 

(3) 

'  An(f)  Au.{f)  ' 

Mf)  = 

.Ak\{})  ■■■  AklU)  - 

(4) 

A(f)  is  the  mixing  matrix  which  is  assumed  to  be  complex-valued 
because  we  introduce  a  model  to  deal  with  the  arrival  lags  among 
each  of  the  elements  of  the  microphone  array  and  room  reverbera¬ 
tions. 

In  the  frequency-domain  ICA,  first,  the  short-time  analysis  of 
observed  signals  is  conducted  by  frame-by-frame  discrete  Fourier 
transform  (DFT).  By  plotting  the  spectral  values  in  a  frequency  bin 
of  each  microphone  input  frame  by  frame,  we  consider  them  as  s 
time  series.  Hereafter,  we  designate  the  time  series  as 

X(f,t)  =  [A,  (/,  f),  •  •  • ,  Xk{},  t)]T.  (5) 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


464 


Next,  we  perform  signal  separation  using  the  complex-valued  in¬ 
verse  of  the  mixing  matrix,  W (/),  so  that  the  L  time-series  out¬ 
put  Y  (/,  t)  becomes  mutually  independent;  this  procedure  can  be 
given  as 


Y(f,t)  =  W(f)X(f,t), 


(6) 


where 


W  <”(/) 


-  [Yi(f,t),---,YL(f,t)\T , 

(7) 

ill  WJP+M( 

r  Wn(f)  WiK(f)  1 

DOA  Estimation 

W(f )  - 

.  WLi(f)  ■■■  Wlk(J)  . 

(8) 

if  final  else 

T“i — m - 

We  perform  this  procedure  with  respect  to  all  frequency  bins.  Fi¬ 
nally,  by  applying  the  inverse  DFT  and  the  overlap-add  technique 
to  the  separated  time  series  Y(f,  t),  we  reconstruct  the  resultant 
source  signals  in  the  time  domain. 

In  the  conventional  ICA-based  BSS  method,  the  optimal  W(f) 
is  obtained  by  the  following  iterative  equation  [3, 7]: 

Wi+1(/)=7?[diag(($(Y(/,t))FH(/,t))t) 


Ordering  &  Scaling 

JW  wU) 


Figure  2:  Proposed  algorithm  combining  frequency-domain  ICA 
and  beamforming. 


-  {*(Y(f,t))Yll(f,t))t]Wi(f)+Wi(f),(9) 

where  (-)t  denotes  the  time-averaging  operator,  i  is  used  to  express 
the  value  of  the  i  th  step  in  the  iterations,  and  rj  is  the  step-size 
parameter.  Also,  we  define  the  nonlinear  vector  function  #(•)  as 

=  [$(Y1(/,f)),--.,$(Yt(/,f))]T>  (10) 

*(«(/,*))  s  [l+exP(-Y/R)(/,t))]“1 

+  j.[l+exp(-Y,(1>(/,*))]_\  (ID 

where  y/r)(/,  f)  and  Y^(f,  t)  are  the  real  and  imaginary  parts 
of  Yi(f,t),  respectively. 


[Step  3:  DOA  estimation]  Estimate  DOAs  of  the  sound  sources 
by  utilizing  the  directivity  pattern  of  the  array  system.  Ft  (/,  9), 
which  is  given  by 

K 

Fi(f,  0)  =  ^2  Wik(f)  exp  [j2nfdk  sin0/c] ,  (13) 

k= 1 

where  Wik{f)  is  the  element  of  W  jP+i+1(f),  and  c  is  the  ve¬ 
locity  of  sound.  In  the  directivity  patterns,  directional  nulls  exist 
in  only  two  particular  directions.  Accordingly,  by  obtaining  statis¬ 
tics  with  respect  to  the  directions  of  nulls  at  all  frequency  bins,  we 
can  estimate  the  DOAs  of  the  sound  sources.  The  DOA  of  the  I  th 
sound  source,  9i,  can  be  estimated  as 


3.  PROPOSED  ALGORITHM 

The  conventional  ICA  method  inherently  has  a  significant  disad¬ 
vantage  which  is  due  to  low  convergence  through  nonlinear  op¬ 
timization  in  ICA.  In  order  to  resolve  the  problem,  we  propose 
an  algorithm  based  on  the  alternation  of  learning  between  ICA 
and  beamforming;  the  inverse  of  the  mixing  matrix,  W(/),  ob¬ 
tained  through  ICA  is  periodically  substituted  by  the  matrix  based 
on  null  beamforming  for  a  temporal  initialization.  The  proposed 
algorithm  is  conducted  by  the  following  steps  with  respect  to  all 
frequency  bins  in  parallel  (see  Fig.  2). 

[Step  1:  Initialization]  Set  the  initial  W  iP+i(fb  ie„  Wo(f), 
to  an  arbitrary  value,  where  the  subscripts  i  and  j  are  set  to  be  0. 
[Step  2:  P-time  ICA  iteration]  Optimize  Wjp+i(f)  using  the 
following  P-time  ICA  iteration: 

WjP+i+1  (/)  -  r,  jdiag  (<*(Y(/, t))  YH(/,  t))() 

-  ( *(Y(f,t))YH(f,t))t}WjP+i(f ) 

+  WjP+i(f),  (12) 

where  *  (=  0,  ■  •  • ,  P  —  1)  is  increased  by  one  every  iteration. 


N/2 

=  yy  (14) 

m=  1 

where  N  is  a  total  point  of  DFT,  and  9i{fm )  represents  the  DOA 
of  the  l  th  sound  source  at  the  m  th  frequency  bin.  These  are  given 
by 

0i(/m)  =  min[argmin  |Fi(/m,  0)|,  argmin  |P2(/m,  0)|1 ,  (15) 
6  0 

O2  (fm )  =  max[argmin  | Pi  ( fm ,  9)  | ,  argmin  |  P2  ( fm ,  9)  |1 ,  (16) 

where  min  [a,  ?/]  (max[x,  y ])  is  defined  as  a  function  in  order  to 
obtain  the  smaller  (larger)  value  among  x  and  y. 

[Step  4]  If  the  (jP  +  i+  l)th  iteration  was  the  final  iteration,  go 
to  step  6;  otherwise  go  to  step  5  with  an  increment  of  j. 

[Step  5:  Beamforming]  Construct  an  alternative  matrix  for  sig¬ 
nal  separation  based  on  the  null-beamforming  technique  where  the 
DOA  information  obtained  in  the  ICA  section  is  used.  In  the  case 
that  the  look  direction  is  9 1  and  the  directional  null  is  steered  to 
02  (see  solid  line  in  Fig.  3),  the  elements  of  the  matrix  for  signal 
separation  are  given  as 

WrifF)(/m)  =  exp[  —  j2'Kfmd\  sin  0i/c] 
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x  {exp  [j2nfmdi  (sin  02 —sin  6\  )/c] 

—  exp  [j27r/md2(sin02-sin0i)/e] }  ,  (17) 

WgF\fm)  =  —  exp [  —  j2nfmd2  sin  0i  / c] 

x  {exp  [j27r/mrfi  (sin  02  —  sin  0i  )/c] 

—  exp[j'27r/mrf2(sin02— sin0i)/c]  }  .  (18) 

Also,  in  the  case  that  the  look  direction  is  02  and  the  directional 
null  is  steered  to  0i  (see  broken  line  in  Fig.  3),  the  elements  of  the 
matrix  are  given  as 

<F)(/ro)  =  —  exp [  -  j2nfr„d]  sin02/c] 

x  {  —  exp  [j2nfmd\  (sin  8 1  -sin  02)/c] 

+  exp  [j2tt d2  (sin  0i  —  sin  02 )  /c]  }  (19) 

W[ F)(/m)  =  exp[  —  j2nfmd2  sin  02/c] 

x  {  -  exp  [j27r/mdi  (sin  8\  —sin  02)/c] 

+  exp[j'27r/md2(sin0i -sin02)/c] }  '.  (20) 

The  elements  given  by  Eqs.  (17H20)  are  inserted  into  W jp(f), 
where  the  subscript  i  is  reset  to  be  0.  Then  we  go  back  to  step  2 
and  repeat  the  ICA  iteration  using  the  Wjp(f)  as  an  initial  value. 
[Step  6:  Ordering  and  scaling]  Using  the  DOA  information  ob¬ 
tained  in  step  3,  we  detect  and  correct  the  source  permutation  and 
the  gain  inconsistency  [8].  By  applying  the  above-mentioned  mod¬ 
ifications,  we  can  finally  obtain  the  optimal  W (/)  as  follows: 


W(f) 


=  < 


'  1  /ft  (/,«.)  0 

0  I/F2UA) 

(without  permutation) 

0  1/F2(/,0i) 

.  l/F,(/,02)  0 

(with  permutation). 


■WjP+i+df), 

(21) 

•WjP+i+iU), 


4.  EXPERIMENTS  AND  RESULTS 
4.1.  Conditions  for  Experiments 

A  two-element  array  with  the  interelement  spacing  of  4  cm  is  as¬ 
sumed.  The  speech  signals  are  assumed  to  arrive  from  two  direc¬ 
tions,  —30°  and  40° .  Two  kinds  of  sentences,  those  spoken  by 
two  male  and  two  female  speakers  selected  from  the  ASJ  contin¬ 
uous  speech  corpus  for  research,  are  used  as  the  original  speech 
samples.  Using  these  sentences,  we  obtain  12  combinations  with 
respect  to  speakers  and  source  directions.  In  these  experiments,  we 
use  the  following  signals  as  the  source  signals:  the  original  speech 
convolved  with  the  impulse  responses  specified  by  different  rever¬ 
beration  times  (RTs)  of  0  msec,  150  msec  and  300  msec.  The 
impulse  responses  are  recorded  in  a  variable  reverberation  time 
room  as  shown  in  Fig.  4.  The  analytical  conditions  of  these  exper¬ 
iments  are  as  follows:  the  sampling  frequency  is  8  kHz,  the  frame 
length  is  32  msec,  the  frame  shift  is  16  msec,  the  window  function 
is  a  Hamming  window,  the  parameter  P  is  set  to  be  100,  and  the 
step-size  parameter  ij  for  iterations  is  set  to  be  1.0  x  10-5. 


Figure  3:  Example  of  directivity  patterns  in  beamforming. 
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Figure  4:  Layout  of  reverberant  room  used  in  experiments. 


4.2.  Objective  Evaluation  of  Separated  Signal 

In  order  to  compare  the  performance  of  the  proposed  algorithm 
with  that  of  the  conventional  BSS  described  in  Sect.  2  for  different 
iteration  points  in  ICA,  the  noise  reduction  rate  (NRR),  defined  as 
the  output  signal-to-noise  ratio  (SNR)  in  dB  minus  input  SNR  in 
dB,  is  shown  in  Figs.  5(a)-(c).  These  values  were  averages  of  all 
of  the  combinations  with  respect  to  speakers  and  source  directions. 

In  Fig.  5(a),  for  the  nonreverberant  test,  it  is  evident  that  the 
separation  performance  of  the  proposed  algorithm  is  superior  to 
that  of  the  conventional  ICA-based  BSS  method  at  every  itera¬ 
tion  after  100  iterations.  For  example,  the  proposed  method  can 
improve  the  NRR  of  about  6.4  dB  at  the  200-iteration  point.  As 
for  the  results  of  DOA  estimation.  Fig.  6  shows  the  average  and 
deviation  of  the  estimated  DOA  at  each  frequency  correspond¬ 
ing  to  -30°.  As  shown  in  Fig.  6.  the  proposed  algorithm  can 
update  W{f)  properly  with  a  more  accurate  estimation  of  DOA 
compared  with  the  conventional  method  (the  same  tendency  was 
shown  at  40° ).  This  contributes  to  the  realization  of  fast  and  high 
convergence  through  the  optimization  of  W  (/)  in  the  proposed 
algorithm  under  the  nonreverberant  condition. 

As  shown  in  Figs.  5(b)  and  (c),  by  the  reverberant  tests,  it  is 
shown  that  the  performance  of  the  proposed  algorithm  is  superior 
to  those  of  the  conventional  ICA-based  BSS  method  at  every  iter¬ 
ation  after  100  iterations.  For  example,  the  proposed  method  can 
improve  the  NRRs  of  about  2.4  dB  (RT=150  msec)  and  0.7  dB 
(RT=300  msec)  at  the  200-iteration  point.  Although  null  beam¬ 
forming  is  not  suitable  for  signal  separation  under  the  condition 
that  the  direct  sounds  and  their  reflections  exist,  we  can  confirm 
that  the  utilization  of  null  beamforming  for  temporal  initialization 
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Number  of  Iterations 

Figure  5:  Noise  reduction  rates  for  different  iterations  in  ICA  in 
the  case  that  the  RT  is  (a)  0  msec,  (b)  150  msec,  and  (c)  300  msec. 


through  ICA  iterations  is  effective  for  improving  the  separation 
performance,  even  under  reverberant  conditions. 

5.  CONCLUSION 

In  this  paper,  we  described  a  fast-  and  high-convergence  algorithm 
for  BSS  where  null  beamforming  is  used  for  temporal  initialization 
through  ICA  iterations.  The  results  of  the  signal  separation  exper¬ 
iments  reveal  that  the  signal  separation  performance  of  the  pro¬ 
posed  algorithm  is  superior  to  that  of  the  conventional  ICA-based 
BSS  method,  and  the  utilization  of  null  beamforming  in  ICA  is  ef¬ 
fective  for  improving  the  separation  performance  and  convergence, 
even  under  reverberant  conditions.  In  future,  further  investigations 
regarding  the  adjustment  of  the  periodical-alternation  parameter  , 
e.g.,  P,  will  be  required,  and  we  will  apply  the  proposed  method 
to  a  noise-robust  speech  recognition  system. 


Figure  6:  Average  and  deviation  of  estimated  DOA  at  each  fre¬ 
quency  corresponding  to  —30°  under  the  nonreverberant  condi¬ 
tion. 


6.  ACKNOWLEDGEMENT 

This  work  was  partly  supported  by  CREST  (Core  Research  for 
Evolutional  Science  and  Technology)  in  Japan. 

7.  REFERENCES 

[1]  R  Common,  “Independent  component  analysis,  a  new  con¬ 
cept?,”  Signal  Processing,  vol.36,  pp.287-314,  1994. 

[2]  A.  Bell  and  T.  Sejnowski,  “An  information-maximization  ap¬ 
proach  to  blind  separation  and  blind  deconvolution,”  Neural 
Computation,  vol.7,  pp.l  129-1 159,  1995. 

[3]  N.  Murata  and  S.  Ikeda,  “An  on-line  algorithm  for  blind 
source  separation  on  speech  signals,”  Proceedings  of  1998 
Internationa l  Symposium  on  Nonlinear  Theory  and  Its  Ap¬ 
plication  (NOLTA  ’98),  vol.3,  pp.923-926,  Sep.  1998. 

[4]  P.  Smaragdis,  “Blind  separation  of  convolved  mixtures  in 
the  frequency  domain,”  Neurocomputing,  vol.22,  pp.21-34, 
1998. 

[5]  L.  Parra  and  C.  Spence.  “Convolutive  blind  separation  of 
non-stationary  sources,”  IEEE  Trans.  Speech  &  Audio  Pro¬ 
cess.,  vol.8,  pp.320-327,  2000. 

[6]  H.  Saruwatari,  S.  Kurita,  K.  Takeda,  F.  Itakura,  and  K. 
Shikano,  “Blind  source  separation  based  on  subband  ICA 
and  beamforming,”  Proc.  ICSLP2000,  vol.3,  pp.94-97,  Oct. 
2000. 

[7]  A.  Cichocki  and  R.  Unbehauen,  “Robust  neural  networks 
with  on-line  learning  for  blind  identification  and  blind  sep¬ 
aration  of  sources,”  IEEE  Trans.  Circuits  and  Systems  I, 
vol.43,  no.  11,  pp.894-906,  1996. 

[8]  S.  Kurita,  H.  Saruwatari,  S.  Kajita,  K.  Takeda,  and  F. 
Itakura,  “Evaluation  of  blind  signal  separation  method  us¬ 
ing  directivity  pattern  under  reverberant  conditions,”  Proc. 
ICASSP2000,  vol.5,  pp.3140-3143,  June  2000. 


467 


RECOGNITION  OF  FACIAL  IMAGES  USING 
SUPPORT  VECTOR  MACHINES 

K.  I.  Kim ,  J.  Kim,  K.  Jung 


A.  I.  Lab,  CS.  Dept.  Korea  Advanced  School  of  Electrical  and  Computer 

Institute  of  Science  and  Technology,  Engineering.  Sungkyunkwan  University, 
Taejon,  305-70 1 ,  Korea  Suwon,  440-746,  Korea 


ABSTRACT 

A  novel  support  vector  machine  (SVM)-based  method  for 
appearance-based  face  recognition  is  presented.  The  proposed 
method  does  not  use  any  external  feature  extraction  process. 
Accordingly  the  intensities  of  the  raw  pixels  that  make  up  the 
face  pattern  are  fed  directly  to  the  SVM.  However,  it  takes 
account  of  prior  knowledge  about  facial  structures  in  the  form  of 
a  kernel  embedded  in  the  SVM  architecture.  The  new  kernel 
efficiently  explores  spatial  relationships  among  potential  eye. 
nose,  and  mouth  objects  and  is  compared  with  existing  kernels. 
Experiments  with  ORL  database  show'  a  recognition  rate  of  98% 
and  speed  of  0.22  seconds  per  face  with  40  classes. 

1.  INTRODUCTION 

It  is  reported  that  sales  of  identity  verification  products  exceed 
$100  million  [1].  Accordingly,  many  methods  have  been 
developed  for  easy  and  reliable  identification.  Among  them,  face 
recognition  has  the  benefit  of  being  a  passive,  nonintrusive 
system  for  verifying  personal  identity.  This  paper  presents  a  face 
recognition  method  designed  for  the  use  of  applications  such  as 
security  monitoring  and  location  tracking.  In  these  applications, 
multiple  images  per  person  are  often  available  for  training  and 
real-time  recognition  is  required  [2].  To  allow  the  system  being 
real  time,  the  proposed  method  excludes  any  of  time-consuming 
feature  extraction  or  pre-processing  stage.  Instead  the  gray  values 
of  raw  pixels  that  make  up  the  face  pattern  arc  directly  feed  to 
recognizer.  In  order  to  absorb  the  resulting  high-dimensionality 
of  input  space,  support  vector  machines  (SVMs).  which  are 
known  to  work  well  even  in  high-dimensional  space,  arc  used  as 
face  recognizer. 

This  idea  is  somewhat  similar  to  recent  applications  of  SVMs 
[3][4],  However,  the  method  proposed  here  differs  in  that  it  takes 
account  of  prior  knowledge  about  facial  structures  and  uses  this 
in  the  form  of  a  kernel  (called  a  local  correlation  kernel)  that  is 
embedded  in  the  SVM  architecture.  A  brief  introduction  to 
SVMs  and  the  use  of  prior  knowledge  for  face  recognition  are 
given  in  Section  2.  Section  3  presents  the  performance  results  of 
the  proposed  method  when  using  the  ORL  database  [5].  It  was 
found  that  the  proposed  method  correctly  recognized  98.0%  of 
the  face  patterns  with  a  speed  of  0.22  seconds  per  face  with  40 
classes.  The  conclusions  and  directions  for  future  research  are 
given  in  Section  4. 


2.  SUPPORT  VECTOR  MACHINES  FOR 
FACE  RECOGNITION 


A  SVM  constructs  a  binary  classifier  from  a  set  of  patterns  called 
training  examples,  which  arc  available  prior  to  classification.  Let 

(x,,v,)eRA  x{±l}.  /  =  be  such  a  set  of  training 

examples.  The  classifer  constructs  a  linear  decision  surface 
(hypcrplane)  of  the  form: 


(  /* 


/(x)  =  sgn  ]Ty,a,xj  ■ x+b 


\  < 


(1) 


where  {xj}^  is  a  subset  of  the  training  data  set.  These  arc  called 
support  vectors  (SVs)  and  are  the  points  from  the  data  set  that 
fall  closest  to  the  separating  hypcrplane.  The  coefficients  a,  and 


b  arc  determined  by  solving  the  large-scale  quadratic 
programming  problem  [6].  This  hypcrplane  is  known  to 
minimize  the  bound  on  its  VC-dimcnsion  and  accordingly,  has 
shown  to  provide  high  generalization  performance  even  in  high- 
dimensional  spaces  (6).  However,  since  it  is  unlikely  that  a 
general  pattern  classification  problem  can  actually  be  solved  by  a 
linear  classifier,  the  SVM  needs  to  be  augmented  in  order  to 
allow  for  non-lincar  decision  surfaces.  The  basic  idea  is  to  map 
the  data  into  another  dot  product  space  (called  the  feature  space ) 
F  \ ia  a  nonlinear  map 


O:  Rv  ->  F . 

and  perform  the  above  linear  algorithm  in  F. 
Since  the  solution  has  the  form 


f  /* 


/(x)=sgn  5>  ,0(x)*  -0(x)  +  /> 


V  '=1 


it  is  nonlinear  in  the  original  input  variables. 


(2) 

(3) 


In  SVMs.  the  mapping  <T>  is  usually  performed  by  the  kernel 
function  as  defined  by: 

*(x,y)=  <t>(x)<t>(y).  (4) 

Then,  by  selecting  the  proper  kernels  k.  various  mappings  (or 
feature  extractions)  O  can  be  indirectly  induced  [6].  One  of 
these  mappings  can  be  achieved  by  taking  the  p- order 
correlations  between  the  entries,  x, ,  of  the  input  vector  x  .  It 
should  be  noted  that  these  features  cannot  be  extracted  by  simply 
computing  all  the  correlations,  since  the  required  computation  is 
prohibitive  when  p  is  not  small  (p> 2):  for  A'-dimensional  input 
patterns,  the  dimensionality  of  the  feature  space  F  is 
(N  +  p- \)./ p\(N  -  l)  .  However,  this  is  facilitated  by  the 
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introduction  of  a  polynomial  kernel,  as  a  polynomial  kernel  with 
degree  p  ( A:(x,y)=(x-y  )'’)  corresponds  to  the  dot  product  of 
two  monomial  mappings.  <t>  p  [6]: 

(op(x)-«I>,(y))=  XX  -yh 


'i  =1 


llvr.J 


=(*-y)p 


(5) 


V  i=i  J 

When  x  represents  an  image  pattern,  the  use  of  this  kernel 
allows  all  possible  correlations  of  p  pixels  in  the  image  to  be 
taken  into  account. 


From  the  feature  extraction  viewpoint,  however,  the  mapping 
O  induced  by  a  polynomial  kernel  has  an  important 

shortcoming — it  does  not  utilize  any  prior  knowledge  while  it 
gets  to  be  common  to  use  it  for  improving  the  system 
performance  [7][8].  With  this  observation,  it  is  reasonable  to 
expect  that  polynomial  kernel  can  be  improved  by  incorporating 
available  prior  knowledge.  The  following  set  of  intuitive 
knowledge  is  considered:  1.  It  is  usually  the  case  that  images 
have  a  local  structure  in  that  not  all  the  correlations  between 
image  regions  carry  equal  amounts  of  information  [8].  2.  The 
human  face  is  a  complex  and  meaningful  pattern  that  contains 
most  of  its  information  in  its  structure.  A  human  face  is  therefore 
expressed  as  a  composition  of  its  components  (or  objects)  such  as 
eyes,  nose,  mouth,  etc.  and  can  be  well  represented  by  exhibiting 
the  features  of  such  individual  objects  and  the  context  between 
them  [7]. 


Figure  1.  Architecture  of  local  correlation  kernel. 


The  local  correlation  kernel  presented  by  Scholkopf.  et  al.  gives  a 
way  of  utilizing  the  first  knowledge  [8],  The  basic  idea  is  that  the 
local  correlations  between  each  adjacent  pixel  are  computed  first, 
and  then  the  long-range  correlations  are  only  computed  based  on 
the  local  correlations.  The  resulting  kernel  corresponds  to  a  dot 
product  in  a  polynomial  space  spanned  mainly  by  localized 
correlations  between  pixels.  Fig.  1  shows  the  architecture  of  a 
kernel  utilizing  local  correlations  in  face  images.  To  compute 
k(x, y)  for  two  patterns  x  and  y  ,  the  products  between  the 
corresponding  pixels  of  the  localized  regions  in  the  two  images 
are  summed  (indicated  by  dot  products  (.  • .)  ),  as  weighed  by  the 
pyramidal  receptive  fields.  The  first  nonlinearity,  in  the  form  of 
the  exponent  pt  ,  is  then  applied  to  the  output.  The  resulting 
values  are  summed,  and  the  p2  -th  power  of  the  result  is  taken  as 


the  value  k(x,  y)  .  The  resulting  kernel  will  be  of  the  order  up  to 
pt-  p2  ,  however,  this  does  not  contain  all  the  possible  pixel 
correlations  but  mainly  just  the  local  ones.  In  the  rest  of  this 
paper,  we  call  this  kernel  as  pure  local  correlation  kernel  in  order 
to  distinguish  this  from  the  new  local  correlation  kernel  that  will 
be  described  later. 

The  second  knowledge  is  the  basis  of  feature-based  methods. 
While  this  knowledge  has  been  effectively  adopted  in  feature- 
based  methods  [2][7],  it  was  not  well  established  in  appearance- 
based  methods.  The  basic  idea  of  pure  local  correlational  kernel 
can  be  extended  to  accommodate  this:  The  knowledge  suggests 
that  a  face  image  should  be  characterized  using  a  two-level 
hierarchy  of  within  and  between  the  object  features.  In  the  case 
of  correlations  as  the  feature,  the  hierarchy  is  realized  based  on 
correlations  between  object  features,  which  are  defined  as  the 
correlations  between  the  pixels  constituting  these  objects.  With  a 
local  correlational  kernel,  this  is  achieved  by  simply  removing 
the  pixel-level  inter-object  correlations  (for  example,  the 
correlations  between  two  pixels,  which  are  located  in  the  left  eye 
and  mouth,  respectively)  from  all  the  possible  correlations. 


a  :  Eyes 
v  :  Nose 
a  :  Mouth 


(a) 


Left  eye  region 
Right  eye  region 
Nose  region 
Mouth  region 


(c) 

Figure  2.  Object  region  configuration:  (a)  coordinates  of  objects, 
(b)  average  face  image,  and  (c)  object  configuration  overlaid  in 
(b). 

As  mentioned  in  Section  1.  the  problem  with  this  approach  is  that 
this  requires  the  facial  image  to  be  analyzed  into  individual  facial 
objects,  while  object  location  task  itself  is  not  trivial  and  usually 
requires  dense  computation.  Accordingly,  an  alternative  is 
adopted  instead  of  directly  utilizing  this  approach.  When  the 
target  is  a  single  frontal  face  image  (with  rather  controlled  zoom 
and  pose),  rough  locations  of  some  objects,  as  a  priori 
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information,  are  available  even  without  structural  analysis.  Fig. 
2a  shows  the  coordinates  of  objects  in  consideration  (eyes.  nose, 
and  mouth)  obtained  from  200  facial  images  of  the  ORL  database 
[5],  It  can  be  observed  that  the  locations  of  these  objects  do  not 
significantly  intercept  with  each  other  nor  very  significantly,  and 
accordingly  can  be  estimated  in  the  rough.  This  is  supported  by 
the  fact  that  the  average  of  200  frontal  images  still  retains  the 
shape  of  human  face  (Fig.  2b)  (eyes  and  mouth  regions  arc 
observed).  Furthermore,  from  this,  we  can  decompose  a  face 
image  into  a  set  of  overlapping  regions,  which  probably  contain 
only  one  object,  respectively  (Fig.  2c).  Then,  the  following 
strategy  is  adopted  to  improve  the  correlation  kernel: 

Restrict  inter-region  correlations,  which  may  be  pixel-level  inter¬ 
object  correlations,  while  retaining  infra  region  correlations  or 
hopefully  intra-object  correlations. 


k  (x.y)=y/ 


*  y 


Figure  3.  Architecture  of  modified  local  correlation  kernel  for 
facial  feature  extraction. 

This  strategy  is  implemented  in  a  modified  form  of  local 
correlational  kernel  in  Fig.  3.  To  compute  k(\, y)  for  two  facial 
images  x  and  y .  it  first  decomposes  them  into  a  set  of  object 
regions  and  computes  local  correlation  kernels  of  order  pt  ■  p2 
for  each  region.  The  resulting  kernel  zobjcc,  then  represents 

localized  characteristics  of  object  regions.  Then  the  global 
correlation  is  computed  from  only  the  summed  products  (of  order 
p} )  of  these  kernel  outputs  and  non-object  regions.  Non-object 
region  is  included  to  take  into  account  information  contained  in 
objects  with  irregular  shape  (such  as  hair).  The  resulting  kernel 
will  be  of  the  order  pt  ■  /?,  •  p3  polynomial  kernel  which 


differes  from  a  standard  polynomial  in  that  it  does  not  utilize  all 
products  of  pt  ■  p2  '  Pi  pixels,  but  mainly  inter  object  ones. 

Since  SVMs  were  originally  developed  for  two-class 
classification,  their  basic  scheme  for  multi-face  recognition  is 
extended  by  adopting  a  one-agamst-others  decomposition 
method.  In  this  strategy.  R  different  SVMs  are  constructed,  one 
for  each  class.  Here  the  r-th  SVM  cor  is  trained  on  the  whole 
training  data  set  in  order  to  classify  the  members  of  class  r 
against  the  rest.  Then,  in  the  recognition  phase,  the  index  of 
SVM  with  largest  output  for  a  given  pattern  is  regarded  as  the 
recognition  result. 

3.  EXPERIMENTAL  RESULTS 

The  system  has  been  tested  with  ORL  face  database  [5],  This  set 
of  faces  includes  ten  different  images  of  40  distinct  subjects.  The 
images  are  grayscale  with  a  resolution  of  92x112.  For  the 
training  and  testing  of  the  recognizer,  the  grayscale  was  linearly 
normalized  to  lie  within  [-1.  1].  All  experiments  were  performed 
using  5  training  images  and  5  test  images  per  person  for  a  total  of 
200  training  images  and  200  test  images.  There  was  no  overlap 
between  the  training  and  test  sets.  Since  the  recognition 
performance  will  be  affected  by  the  selection  of  training  images, 
the  reported  results  were  obtained  by  training  20  recognizers'  for 
each  dichotomy  with  different  training  examples  (random 
selection  of  5  images  from  10  per  each  subject,  resulting  in  5 
positive  and  35  negative  for  each  SVM)  and  selecting  the 
average  error  over  all  the  results.  The  system  was  implemented 
using  Visual  C++  language  on  a  Pentium  Ill  compatible  CPU. 
The  average  recognition  time  was  0.22  seconds  for  a  face  pattern 
with  40  classes.  This  speed  is  sufficient  for  tasks  such  as  security 
monitoring  and  location  tracking. 


Table  1  shows  the  error  rates  with  different  kernel  degrees  p , . 
j p,  .  and  p,  .  The  best  performances  was  obtained  with 
{p,  =  3,p,  =  2,p,  =  l)  and  (p{  =  \,  p2  =  2,  p}  =  2) 
(shaded  entries  in  table)  which  yields  degree  6(=  3-2-1)  and 
4(=  1-2-2)  correlations. 


1  Out  of  a  total  of  101/5!  =  30240  combinations. 
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To  gain  a  better  understanding  of  the  relevance  of  the  results 
obtained  using  local  correlation  kernels,  benchmark  comparisons 
with  other  kernels  were  carried  out.  A  set  of  experiments  was 
performed  using  SVMs  with  different  kernels.  Table  2 
summarizes  the  type  of  kernels  and  their  parameter  settings  used 
in  the  experiments.  These  parameters  were  set  empirically  i.e. 
those  parameters  which  yielded  the  best  performances  from 
several  experiments.  Table  3  shows  the  recognition  results.  For 
comparison,  the  result  obtained  from  the  local  correlation  kernel 
is  also  presented.  It  should  be  noted  that  linear  SVMs  ranked  as 
the  third  to  the  pure  and  proposed  local  correlation  kernels.  This 
is  because  the  problem  was  linearly  separable  as  the  face  space 
was  high-dimensional  (92x112)  and  very  sparse  (with  few 
training  and  testing  examples).  The  zero  training  error  for  the 
linear  SVMs  supported  this  observation.  In  this  case,  making  the 
classification  space  larger  than  the  input  space  is  not  preferable 
as  in  other  possible  linear  non-separable  applications  [4].  In 
contrast,  the  superior  performance  of  the  local  correlation  kernel 
confirms  the  usefulness  of  prior  knowledge  for  constructing  the 
classification  space  and  verifies  its  appropriateness  for  face 
recognition. 

Table  2.  Different  kernels  their  parameter  settings  used  in 


experiments 


Kernels 

Parameters 

None  (linear  SVM) 

X 

II 

5" 

P  =  3 

i(x,y)  =  exp(-— i-rlx-yf) 

Zcj 

<r  =  0.5 

A(x,y)  =  tanh(x-y-0) 

0  =  1.5 

Pure  local  correlation  kernel 

(p]  =3  ,p2  =2) 

Table  3.  Error  rates  of  SVMs  using 

different  kernels. 

Kernels 

Error  rates  (%) 

None  (Linear  SVM) 

3.2 

Polynomial 

3.4 

Gaussian 

4.2 

Tangent  hyperbolic 

5.3 

Pure  local  correlation  kernel 

2.7 

Local  correlation  kernel 

2.0 

Table  4.  Error  rates  of  various  systems. 


System 

Error  rates  (%) 

Eigenfaces  [9| 

10.0 

Psudo-2DHMM  [9] 

5.0 

Convolutional  neural  network  f  10] 

3.8 

Linear  SVMs  [31 

3.0 

Table  4  shows  a  summary  of  the  performance  of  various  systems 
for  which  results  using  the  ORL  database  are  available  [3][9][10], 
The  proposed  method  showed  the  best  performance  and 
significant  reduction  of  error  rate  (33.3%)  from  the  second  best 
performing  system-linear  SVMs  [3]. 

4.  CONCLUSIONS 

A  novel  SVM-based  method  is  proposed  for  appearance-based 
face  recognition.  The  proposed  method  takes  account  of  prior 
knowledge  about  facial  structures  in  the  form  of  a  kernel 
embedded  in  the  SVM  architecture.  The  new  kernel  explores 
spatial  relationships  among  potential  eye,  nose,  and  mouth 
objects  and  showed  better  performance  than  other  kernels. 

The  application  domain  of  the  proposed  method  is  not  limited  in 
the  problem  of  face  recognition.  It  can  also  be  applied  to  the 
problem  of  face  authentification  and  face  detection.  By  shifting 
the  detection  window  to  all  locations  within  an  image,  a  face 
detection  problem  can  be  reduced  to  a  problem  of  binary 
classification  (i.e.  face  class  or  background  class).  Accordingly, 
further  experiments  are  required  for  these  two-class  face 
classification  applications.  The  proposed  method  is  insensitive  to 
color,  which  is  often  present  in  single  face  images,  although 
color  is  often  unreliable  because  of  the  difficulty  of  accurate 
camera  calibration.  However,  it  would  also  be  interesting  to 
explore  the  utility  of  color  information  for  face  recognition. 
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ABSTRACT 

This  paper  introduces  an  extension  of  conditional  entropy- 
constrained  RVQ  (CEC-RVQ)  to  include  quantization  cell  shape 
gain.  The  method  is  referred  to  as  conditional  entropy-constrained 
trellis-coded  RVQ  (CEC-TCRVQ).  The  new  design  is  based  on 
coding  image  vectors  by  taking  into  account  their  2-D  correlation 
and  employing  a  higher  order  entropy  model  with  a  trellis  struc¬ 
ture.  We  employed  CEC-TCRVQ  to  code  image  subbands  at  low 
bit  rate.  The  CEC-TCRVQ  coded  images  do  well  in  term  of  pre¬ 
serving  low-magnitude  textures  present  in  some  images 

1.  INTRODUCTION 

For  ergodic  stationary  sources  Vector  quantization  (VQ)  is  optimal 
in  a  rate-distortion  sense  for  a  given  vector  size.  However,  it  has 
not  been  successfully  applied  to  image  coding  in  spatial  domain. 
One  of  the  primary  reason  is  the  fact  that  due  to  high  inter-pixel 
correlation  of  real  world  imagery,  to  get  good  performance,  a  fairly 
large  vector  sizes  are  needed.  Since  for  VQ  implementation,  the 
codebook  size,  and  hence  the  complexity,  memory  and  the  needed 
training  data  size,  all  grow  exponentially  with  the  vector  size  and 
the  encoding  rate,  large  vector  sizes  become  prohibitive. 

Relief  can  be  obtained  by  employing  a  multi-stage  VQ 
(MSVQ),  also  known  as  residual  VQ  (RVQ),  for  image  coding 
purposes.  Entropy-constrained  residual  vector  quantization  (EC- 
RVQ)  [6],  is  a  high-performance,  computationally  efficient  imple¬ 
mentation  over  conventional  VQ  for  image  coding.  It  was  shown 
in  [5]that  improved  rate-distortion  performance  of  an  EC-RVQ  for 
image  coding  can  be  realized  by  exploiting  adjacent  vector  depen¬ 
dencies.  The  improved  image  coding  design  is  called  conditional 
entropy-constrained  RVQ  (CEC-RVQ).  The  CEC-RVQ  employed 
a  higher-order  conditional  entropy  model  with  multistage  structure 
of  RVQ,  to  achieve  a  reduction  of  as  much  as  40%  for  the  same  im¬ 
age  quality  as  for  EC-RVQ. 

In  order  to  incorporate  quantization  cell  shape  gain,  a  trellis- 
based  coding  was  employed  in  CEC-RVQ  design.  The  method  was 
called  conditional  entropy-constrained  trellis-coded  RVQ  (CEC- 
TCRVQ)  [3].The  approach  taken  in  CEC-TCRVQ  is  to  employ 
adjacent  and  stage-conditioning  symbols  and  select  conditioning 
symbols  for  higher-order  model  jointly  over  the  long  term. 

The  direct  application  of  CEC-TCRVQ  to  image  coding  leads 
to  a  blocky  appearance  of  the  reconstructed  image.  This  problem 
becomes  more  apparent  at  low  bit  rates.  It  was  found  that  this  prob¬ 
lem  does  not  occur  when  we  code  subbands.  Another  advantage  in 
coding  image  subbands  is  that  the  vector  dimensions  need  not  be 


Figure  1 :  Quantizing  supervector  using  residual  vector  quantiza¬ 
tion 

large.  Further  more  coding  of  the  various  subbands  can  be  done  in 
parallel  and  thus  is  suitable  for  real-time  implementation.  Pyramid 
image  coding  is  a  form  of  subband  coding  and  differs  from  con¬ 
ventional  subband  coding  in  that  it  involves  an  octave-step  division 
of  the  frequency  axes,  whereas  conventional  subband  coding  splits 
the  frequency  axes  uniformly.  The  motivation  behind  the  research 
presented  in  this  paper  is  to  show  the  application  of  CEC-TCRVQ 
to  pyramid  image  coding. 

The  paper  is  organized  as  follows.  Section  2  provides  the  re¬ 
view  of  conditional  entropy-constrained  RVQ.  The  CEC-RVQ  is 
then  extended  to  include  trellis-coding  in  Section  3.  Section  4 
discusses  subband  quantization  scheme  employed.  Bit  allocation 
problem  is  described  in  Section  5.  Simulation  results  and  compari¬ 
son  with  other  subband  coding  techniques  are  presented  in  Section 
6. 

2.  CONDITIONAL  ENTROPY-CONSTRAINED  RVQ 

Let 

X  --  {a;o,  an, . . .  } 

be  a  supervector  of  n  consecutive  vectors.  Each  component  of  the 
supervector  is  quantized  by  a  P-stage  EC-RVQ  encoder  as  shown 
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in  Figure  1.  It  is  noteworthy  that  the  cascade  of  stage  VQs  shown 
in  the  figure  enacts  a  direct  sum  codebook.  That  is,  the  direct  sum 
codebook  associated  with  X  is  defined  by  summing  all  combi¬ 
nations  of  the  stage  codevectors  in  VQi,  VQ2,  ■  ■  ■ ,  VQp.  After 
quantizing  the  supervector,  the  codebook  indices  are  fed  to  an  en¬ 
tropy  coder  which  outputs  a  variable-length  bit  sequence  for  each 
input  component  of  the  supervector.  The  variable-length  index  se¬ 
quence  for  the  supervector  is  denoted  as 

I  —  {*0 ,  *  1 ,  -  ■  •  ,  in  — 1 } 

=  {(*0,1,  *0,2,  •  •  •  ,  *0.p),  (*1,1,  *1,2, .  .  .  ,*l,p),  -  .  .  , 

(*n— l,l,*n  — 1,2,  •  •  ■  ,  in-1, p)}  (1) 

Let  P(C)  be  the  probability  that  a  supervector  X  is  repre¬ 
sented  by  the  codevector  sequence  C  =  {co, ci, ..., cn_x}  ac¬ 
cording  to  index  sequence  I.  Here  each  component  c,  represents 
a  direct  sum  codevector.  The  distortion  associated  with  the  super¬ 
vector  is  given  by 


1 . 1  t . 1 

1  st  RVQ  Stage  2nd  RVQ  Stage 

(a) 

Q0=  [C|j  +  C]2  +  -  +  Q.p-i  +  fitj 

Q2=  |Cj  1  +  C]2  +  —  +  Q  p.  1  +  Dhj 

Ql=  [q>,  »  C22  +  — + Op,  |  +  Djj 
Q3=(^^  +„.+CP.,+Dj| 

lb) 


t . f  t . f 

P- 1  th  RVQ  Stage  Pth  RVQ  Stage 


n— 1 

d(X,C)  =  J2di(  Xi,a) 
i= 0 

The  design  goal  for  the  Conditional  Entropy-constrained  RVQ  is 
to  minimize  the  Lagrangian 

Jx  =  E{d(X,C)}  +  \E{l(I)} 

where  d(X,  C)  is  the  distortion  between  the  supervector  X  and 
the  codevector  sequence  C,  and  1(1)  is  the  length  of  the  index  se¬ 
quence  I.  Ideally,  we  choose  the  length  of  the  codevector  sequence 
to  be 

1(1)  =  -  log  P(C). 

In  order  to  minimize  J\ ,  we  compute  the  Lagrangian  for  all 
possible  combinations  of  codevector  sequences,  which  can  grow 
extremely  large  as  n  increases.  Large  supervectors  will  require  a 
large  number  of  additions,  a  large  amount  of  storage  for  P(C)  and 
a  large  variable  length  code. 

The  solution  we  adopt  is  to  use  first  or  second-order  condi¬ 
tioning  models  to  approximate  the  probability  of  occurrence  of  a 
particular  codevector  sequence.  Assuming  a  first-order  conditional 
model,  the  probability  of  a  specific  codevector  sequence  has  the 
form 

P(C)  =  P(co)P(ci|co)P(c2|ci)  •  •  •  ,P(c„_i|cn_2).  (2) 

The  Lagrangian  associated  with  this  model  is  given  by 
Jx  =  d(xo,c<>)  —  AlogP(co) 

n  —  1 

+  ^^{d(xi,a)  —  Alog  P(c,|c,_i)}.  (3) 

1=1 

The  Lagrangian  Jx  in  equation  (3)  is  the  sum  of  Lagrangians  from 
each  component  of  the  supervector,  where  the  Lagrangian  compo¬ 
nent  vector  is  given  by 

d(*,,Ci)  -  AlogP(ci|c,_i).  (4) 

The  above  equations  dictate  that  we  need  to  find  conditional  prob¬ 
abilities  and  then  find  the  best  codevector  sequence  to  represent 
the  supervector  that  minimizes  the  Lagrangian  in  equation  (3).  For 


Figure  2:  Conditional  Entropy-constrained  Trellis  Coded  Residual 
Vector  Quantizer 


residua]  vector  quantizers,  conditioning  codevectors  (or  symbols) 
may  come  from  the  previous  spatial  location  (intra-stage)  or  from 
a  previous  residual  stage  location  (inter-stage).  The  procedure  pro¬ 
posed  in  [5]  is  to  search  a  small  neighboring  region  in  both  intra- 
and  inter-stage  space  to  find  the  optimal  symbol  for  conditioning. 
The  candidate  symbols  are  then  arranged  in  a  tree  structure.  Sub¬ 
ject  to  conditioning  complexity,  the  BFOS  algorithm  [8]  may  be 
used  to  determine  conditioning  symbols  for  every  residual  stage. 

Once  the  best  conditioning  symbols  and  order  are  determined 
for  each  residual  stage,  the  next  task  is  to  compute  the  Lagrangian 
for  all  the  possible  codevector  sequences  under  the  above  condi¬ 
tioning  model.  In  [5],  the  authors  adopted  an  algorithm  to  find  the 
best  codevector  for  each  component  of  the  supervector  in  isolation 
by  minimizing  the  Lagrangian  in  equation  (4). 

3.  CONDITIONAL  ENTROPY-CONSTRAINED  TCRVQ 

Let  R  be  the  encoding  rate  (in  bits  per  sample)  and  n  the 
source  vector  dimension.  The  conditional  entropy-constrained 
trellis-coded  residual  vector  quantizer  (CEC-TCRVQ)  proposed 
here  uses  an  iV-state  trellis  with  two  branches  entering  and 
leaving  each  state.  Figure  2(c)  shows  a  4-state  trellis  with 
two  branchess  entering  and  leaving.  The  trellis  branches  are 
labeled  with  codebooks  obtained  as  follows.  The  encoding 
rate  R  can  be  decomposed  into  stage  component  rates  given  by 
R  =  R\  +  R2  +  ■  ■  •  +  Rp.  Let  C  be  the  first  stage  expanded 
codebook  with  2"Rl+1  code  vectors.  Then  C  is  partitioned, 
in  the  sense  of  increasing  intra-codebook  distance,  to  form  two 
first  stage  codebooks  Ci,i  and  C2, 1  as  shown  in  Figure  2(a). 
Two  RVQs  are  designed  next  to  match  their  first  stage  codebooks 
respectively.  The  last  stage  of  each  RVQ  is  partitioned  again  to 
form  four  sub-codebooks  i.e  D0,  D\ ,  _D2,  £>3.  The  codebook 
labeling  for  the  trellis,  Q 0,  Q 1,  Q 2,  Q 3  are  obtained  by  joining 
stage  codebooks  and  last  stage  subcodebooks  as  shown  in  Figure 
2  (b).  Selection  of  one  or  the  other  RVQ  structure  at  any  given 
time  instant  depends  on  which  trellis  state  we  are  in  at  that  instant. 
The  indices  from  the  two  RVQs  are  subsequently  entropy  coded 
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by  using  their  respective  entropy  coders. 

The  entropy-coders  employed  in  CEC-TCRVQ  make  use  of 
conditioning  symbols,  which  come  from  the  previous  adjacent 
vectors  as  well  as  previous  residual  vectors.  Like  CEC-RVQ 
described  in  [5],  we  search  a  smaller  region  in  both  intra-residual 
stage  and  inter-residual  stage  space  for  finding  conditioning 
symbols  for  a  given  residual  stage  in  CEC-TCRVQ  design.  Then 
a  complexity-entropy  trade  off  tree  is  constructed  with  the  number 
of  branches  equal  to  the  number  of  residual  stages  present  in  the 
underlying  RVQ.  The  tree  is  searched  using  a  BFOS  algorithm 
[8]  to  find  the  best  conditioning  symbols  along  with  conditioning 
model  order  for  each  residual  stage.  Then  Lagrangian  of  the  form 
of  equation  (3)  is  found  by  using  the  Viterbi  algorithm  along  a 
trellis  structure  with  component  Lagrangians  of  equation  (4)  as  a 
branch  metric. 


4.  SUBBAND  QUANTIZATION  SCHEME 

In  our  proposed  scheme,  the  image  is  split  into  a  pyramid,  and 
each  pyramid  is  coded  independently.  The  pyramid  construction 
process  begins  with  the  splitting  of  the  image  into  four  subbands, 
and  then  continues  with  the  division  of  the  lowpass  band  recur¬ 
sively  up  to  the  required  level  of  decomposition.  Here  we  used 
three  levels  of  decomposition  to  get  ten  subbnads. 

We  used  trellis-coded  residual  vector  quantization  (TCRVQ) 
[2]  for  designing  codebooks  of  the  image  pyramids  by  employ¬ 
ing  a  training  set  of  14  (512  x  512)  images.  The  image  subbands 
usually  differed  in  their  spectral  contents  [7],  therefore  normalized 
codebooks  were  designed  for  each  subband  by  dividing  all  of  the 
training  data  by  their  respective  standard  deviations.  The  mean  of 
the  baseband  (LL3  band)  was  also  subtracted.  Therefore,  the  mean 
of  the  baseband  and  the  standard  deviations  of  its  ten  bands  needed 
to  be  sent  to  the  decoder.  This  overhead  information  corresponds 
to  a  negligible  increase  in  the  overall  bit  rate. 

Figure  3  shows  various  trellis-coded  residual  vector  quan¬ 
tization  schemes  used  to  quantize  the  subbands.  The  LL3  band 
which  contains  the  texture,  also  contains  strong  two-dimensional 
correlation.  In  order  to  effectively  exploit  the  correlation  to  re¬ 
duce  the  bit  rate,  we  coded  the  LL3  band  using  three-stage  con¬ 
ditional  entropy-constrained  trellis-coded  residual  scalar  quantiza¬ 
tion  (CEC-TCRSQ).  The  reason  for  using  a  scalar  quantizer  lies 
in  the  fact  that  it  is  difficult  to  code  textures  using  vector  quanti¬ 
zation  without  producing  visual  artifacts.  The  bands  HL3,  HH3 
and  LH3,  also  contain  some  vertical  and  horizontal  correlation 
so  we  employed  three-stage  two-dimensional  conditional  entropy- 
constrained  trellis-coded  residual  vector  quantization.  There  is  lit¬ 
tle  correlation  present  in  the  HL2,  HH2,  and  LH2  bands.  There¬ 
fore,  we  used  four-dimensional  trellis-coded  residual  vector  quan¬ 
tization  for  these  bands.  The  HL1,  HH1  and  LH1  bands  contained 
very  small  correlation  and  also  a  small  amount  of  image  energy. 
Hence  we  needed  to  code  them  at  low  bit  rates.  In  our  scheme, 
we  coded  these  bands  using  16-dimensional  trellis-coded  residual 
vector  quantization. 


5.  BIT  ALLOCATION 

Once  the  quantization  scheme  is  specified  for  the  image  pyramids, 
the  next  issue  is  how  to  distribute  the  bit  budget  among  the  sub¬ 
bands.  Westerink,  Biemond,  and  Boekee  [11]  developed  an  opti¬ 
mal  bit  allocation  algorithm  based  on  the  subband  variance.  Riskin 


[8]  restated  their  algorithm  using  the  generalized  BFOS  algorithm 
for  both  cases  of  convex  and  non-convex  operational  distortion- 
rate  functions. 

We  employed  bit  allocation  using  the  generalized  BFOS  algo- 
rithm.The  BFOS  algorithm  can  be  used  as  follows:  construct  a  tree 
T  with  I  subtrees  where  each  subtree  is  a  unary  tree  and  represents 
a  subband.  In  each  subtree  we  have  k  nodes  where  each  node  is 
represented  by  an  (/?,  D)  point  found  during  the  quantization  de¬ 
sign.  If  we  denote  the  initial  tree  by  Ti,  the  generalized  BFOS 
algorithm  will  prune  off  the  branches  of  the  initial  tree  in  order  to 
form  the  final  pruned  tree  Tf-  In  this  pruning  operation,  the  al¬ 
gorithm  obtains  a  sequence  of  trees  where  each  intermediate  tree 
T,  + 1  is  obtained  by  pruning  off  the  node  having  the  smallest  slope 
s  in  the  tree  T,  .  The  pruned  leaf  node  belongs  to  a  certain  subtree, 
and  therefore  this  iteration  provides  a  new  leaf  node  in  the  previ¬ 
ous  tree.  After  this  procedure,  the  ,s  ratio  must  be  re-calculated  in 
this  new  tree  T;+ 1.  The  algorithm  ends  when  the  sum  of  the  leaf 
node  rates  drops  below  the  target  rate.  The  codebook  used  to  en¬ 
code  each  subband  corresponds  to  the  codebook  specified  by  the 
leaf  nodes  of  the  final  pruned  tree  2>. 


6.  SIMULATION  RESULTS 

In  this  section,  we  present  results  for  512  x  512  Lena  at  low  bit 
rates.  For  the  bit  allocation  tree,  we  obtained  thirty  rate-distortion 
pairs  for  each  subband.  The  tree  has  ten  branches  with  thirty  points 
on  each  branch.Figure  4  shows  Lena  image  coded  at  0. 123  bits  per 
sample  using  our  scheme.  We  observe  that  the  image  coded  by  our 
scheme  is  slightly  blurred  in  nature.  We  also  noticed  the  presence 
of  small  magnitude  texture  on  the  Lena  hat  in  our  coded  image. 

Figure  5  compares  our  TCRVQ-based  subband  coder 
(TCRVQ-SBC)  with  other  results  in  the  literature  for  the  test 
image  Lena.  Kim  and  Modestino  [4]  report  PSNR's  of  34.04 
dB.  35.28  dB,  35.98  dB.  37.23  dB  for  bit  rates  of  0.31,  0.41, 
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Figure  4:  Image  Lena  coded  at  0.125  bits  per  pixel  using  our  pro¬ 
posed  image  coder,  PSNR  =  30.43  dB. 


0.48,  and  0.64  bpp,  respectively,  for  their  entropy-constrained  sub¬ 
band  coder  (2-D  ECSBC).  Joshi,  Crump  and  Fischer  fl]  devloped 
arithmetic-coded  trellis-coded  subband  image  coder  (ACTCQ- 
SBC)  and  is  shown  to  provide  about  0.25  dB  improvement  over  the 
2-D  ECSBC  design.  Sriram  and  Marcellin  [10]  report  PSNR’s  of 
34.01,  36.70,  and  40.06  dB  for  bit  rates  of  0.27, 0.47,  and  0.95  bits 
per  pixel,  respectively,  for  their  entropy-constrained  trellis-coded 
quantization  based  subband  image  coder  (ECTCQ-SBC).  SPIHT 

[9]  results  are  also  displayed  in  the  figure.  The  figure  shows  that 
our  coder  does  better  than  the  ACTCQ-SBC  and  the  2-D  ECSBC. 
Comparing  the  performance  of  our  coder  with  that  of  ECTCQ- 
SBC  shows  that  TCRVQ-SBC  performance  is  worse  by  about  0.5 
dB  at  0.5  bits  per  pixel  and  is  about  0.15  dB  worse  at  0.25  bits  per 
pixel.  This  may  be  due  to  the  reason  that  ECTCQ  is  a  single  stage 
system  as  compared  to  ECTCRVQ.  The  TCRVQ-SBC  performs 
worse  in  comparison  to  SPIHT  by  about  0.6  dB.  We  believe  that 
this  gap  is  due  to  the  reason  that  SPIHT  coder  exploits  inter-band 
dependence  while  our  coder  does  not. 
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ABSTRACT 

Many  applications  require  identification,  segmentation 
and  deconvolution  of  textures  and  detection  of  the  objects 
of  combined  patterns.  Frequency-based  analysis  of 
patterns  [1]  does  not  avoid  the  redundancy  which  highly 
deteriorates  the  process.  On  the  other  hand,  spatial 
quantifiers  [2]  rely  on  the  rough  estimation  of  the  model 
parameters,  which  are  not  robust  enough  to  details  and 
alteration  of  the  regions  size.  By  using  HOS  [3], 
classification  based  on  variation  in  minimum  and 
maximum  phase  of  cepstra,  through  a  spatial-based 
sliding  window  incorporates  both  space  and  frequency 
differences.  A  small  number  of  minimum  and  maximum 
phase  coefficients  are  then  evaluated  for  an  sliding 
window  of  fixed  size.  The  results  show  an  attractive 
implementation  of  HOS  estimation  in  texture 
segmentation. 

1.  INTRODUCTION 

Image  segmentation  process  separates  the  regions  of 
different  statistics.  Grey-level,  colour  and  texture 
segmentations  are  the  main  topics  in  this  field.  Texture 
segmentation  has  been  widely  demanded  due  to  its 
application  in  pattern  and  object  recognition. 
Deconvolution  of  mixed  textures  in  image  segmentation 
has  been  the  subject  of  research  recently.  Higher  order 
statistics  (HOS)  in  signal  separation  and  deconvolution 
has  also  been  under  research  by  many  researchers 
[3][4][5],  Traditional  methods  in  texture  segmentation 
such  as  application  of  Gabor  filtering.  Auto-regressive 
modelling,  etc  fails  in  deconvolution  of  mixtures  and 
detection  of  the  boundary  between  the  true  texture  and 
contaminated  one.  For  example  detection  of  the  objects 
partially  covered  by  nets  and  stains  or  partitioning  the 
human  tissue  into  normal  and  slightly  malignant  can  be 
mentioned  as  suitable  patterns  for  our  experiments.  The 
malignant  tissue  is  a  combination  of  normal  cells  pattern 
and  a  non-uniform  granular  texture.  However,  most  of 
our  experiments  are  on  Brodatz  textures  and  their 
combinations. 


The  proposed  method  requires  measurement  of  3rd  order 
statistics  and  their  spectrum.  In  bispectrum  domain  a 
zero-mean  quasi-Gaussian  noise  will  be  suppressed  or 
highly  abated.  This  enhances  accuracy  of  estimation  of 
the  signals  parameters  in  that  domain.  Application  of 
accurate  measurement  criteria  and  near-optimal 
estimation  of  the  pattern  statistics  enhance  the  outcome  of 
the  process.  In  next  part  the  theoretical  approach  will  be 
explained.  The  implementation  result  comes  next. 

2.  PRELIMINARIES 

Some  images  can  be  viewed  as  an  original  texture 
partially  contaminated  by  one  or  more  other  textures.  The 
textures  may  also  be  polluted  by  Gaussian  noise.  In  this 
case  recovering  the  actual  texture  from  the  mixed  pattern 
is  required  so  we  can  convert  the  question  of  texture 
segmentation  into  a  question  of  signal  reconstruction. 

For  a  minimum  phase  sequence,  the  log  magnitude  of  its 
Fourier  transform  and  its  Fourier  phase  form  a  Hilbert 
transform  pair.  Hence,  we  can  compute  the  signal's 
Fourier  magnitude  from  its  Fourier  phase  and  vice  versa. 
Consequently,  the  knowledge  of  only  the  Fourier  phase 
or  magnitude  of  minimum  phase  signal  can  lead  to  the 
unique  reconstruction  of  the  signal.  However  the 
reconstruction  is  subject  to  fulfilling  certain  requirements. 
The  conditions  under  which  a  general  FIR  sequence  can 
be  reconstructed  form  its  bispectral  phase  only  can  be 
stated  as  [6]: 

Let  x (k)  and  y(k)  be  two  FIR  sequences  which  are  zero 
outside  the  interval  [0,N-1],  and  their  Z  transform  have  no 
zeros  on  the  unit  circle,  nor  its  reciprocal  pairs.  Let 

(Pi  (<y, ,  CO,  \  cpl  {fOl  ,C0,  )  be  the  bispectral  phase  of 
x(k)  and  y(k)  respectively.  Also  suppose  we  sample  the 
bispectral  phases  at  L  =  2'  >  2N-1  equal-space  frequency 

points.  If  (px3  (coltCO 2  )=  <Pj  ip) j, CO 2  )  at  discrete 
frequency  pairs  within  the  non-redundant  bispectrum 
region  ^0  <  COj  +  CO,  <  TC,  co,  <  (01,0)I  >  then  we 
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have  x(k)  =  ay(k-ko),  for  some  positive  constant  O. .  and 
some  integer  k0. 

In  this  case  let  x(k)  be  a  FIR  sequence  which  is  zero 
outside  the  interval  [ 0,N-1 ],  and  its  Z  transform  has  no 
zeros  on  the  unit  circle,  nor  its  reciprocal  pairs.  Let 

(p'l  (fOl ,  (02 )  be  the  bispectral  phase  of  x(k).  Suppose  we 

sample  the  bispectral  phases  at  L  =  2V>2N  -  /equispaced 
frequency  points.  The  BIRA  algorithm  [3]  can  recover  a 
scaled  and  shifted  version  of  the  original  signal  form  its 
bispectral  phase  only.  This  algorithm  proceeds  as 
follows: 

Estimate  the  bicepstrum  of  x(k)  and  then  compute  the 
values  of  A(m)  —  B(m)  ,  we  have: 


x'(fc)=  F~l 

3 

II 

O 

..M  -1 

(7) 

1 

— x< 

-  A1  (m) 

m  >  0 

c'x  (tn)  = 

0 

m  =  0 

(8) 

m 

B'(m) 

m<  0 

Where  x'(k )  is  the  computed  sequence  at  each  iteration 
and  M  (M  >  2r)  is  the  length  of  the  Fourier  transform 
used  in  equation  (7). 

Step  3:  Generate  the  sequence  y'(k)  as  follows: 

y'(k)=  x'(k)wN(k),  k  =  0,...M-l  (9) 


D(m)=  A(m)- B(m),m  =  1,2,... r  (1) 

Where  /■  =  max  (p,  q),  p  and  q  are  the  lengths  of  A(m)  and 
B{m)  respectively.  When  the  Fourier  magnitude  is 
corrupted,  only  the  differences  of  the  computed  cepstral 
coefficients  contain  undistorted  information.  Note  that: 

D(m)  =  -2m  ■  bic°x  (m)  (2) 

where  bic°x  (m)  is  the  initial  bispectrum,  i.e.  at  iteration  i 

=  0.  Initially  we  set  each  sum  of  the  cepstral  coefficients 
to  some  arbitrary  value  such  as  zero; 

A°(m)  +  fi°(m)  =  0,m  =  l,2,...,r  (3) 

Where  A°(m)  and  B°(m)  denote  the  values  of  the  cepstral 
coefficients  at  iteration  i  =  0.  Thus  we  have: 

A°(m)  +  B°(m)  —  — mp°x(m )  =  0  ,m  =  1,2,.,.,  r 

(4) 


Where  denotes  the  value  of  the  cepstrum  of  x(k) 

at  iteration  i  =  0.  The  reconstructed  signal  will  be 
achieved  after  following  the  iterations  below. 


Step  1:  Combine  (2)  and  (3)  for  any  iteration  i,  we  have: 

D(m)-m ■  p'x(m) 


A'  (m)  =  ■ 


B‘(m ) 


—  D(m)—  m  ■  p'x(m ) 


(5) 


(6) 


Where  m=l,2 . r  and  {/}is  the  iteration  index. 


Step  2:  Compute  x(k)  using  the  following  relationship: 


Where 


M  -k0  <k  <  N  -k0-l 
otherwise 


(10) 


Where  k0  is  the  time  shift  introduced  to  the  signal  due  to 
its  reconstruction  from  its  cepstrum  coefficients  and  WN 
shows  the  size  of  the  window.  Due  to  this  shift,  x\k)  will 
appear  in  the  interval  [-k0,N-k0-l],  and  is  computed  from 
(7).  k0  is  identified  by  using  an  iteration  algorithm.  This 
will  be  discussed  later  in  this  part. 


Step  4:  Calculate  the  power  cepstrum  of  x'(k). 


(ID 

and  set 

PT(m)=Py(m) 

(12) 

Repeat  Steps  1-4  until  the  reconstructed  sequence  x\k ) 
remains  unchanged.  In  other  words,  if  we  define 


m-i  r  .. 

E,=£U'  (*)-*H<*)]  (13) 

k=0 

The  algorithm  stops  at  i  =  I,  when  Ej  <  S  where  S  is 
a  very  small  constant. 

Then  we  can  get  the  result: 

x1  (k)=  ax(k  - k0)  (14) 

For  the  algorithm  to  converge  k0  has  to  be  accurately 
identified.  In  order  to  determine  the  time  shift  k0  we  guess 
an  initial  value  for  k0  within  [0,N-1]  (we  can  start  from  0). 
Apply  Step  1-4  while  checking  E0  for  the  successive 
iterations.  A  second  loop  is  used  to  decide  about  the  value 
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of  k0  .  The  value  of  k0  is  incremented  one  by  one  and  the 
outcome  will  be  tested. 

However,  since  x(k )  must  be  a  minimum  phase  FIR 
sequences  which  is  zero  outside  the  interval  [0,N-1]  and 
estimation  of  the  HOS  parameters  involves  error,  for 
some  natural  data,  the  performance  of  the  algorithm  is  not 
satisfactory.  Conditioning  of  the  signal  without  its 
deterioration  is  a  solution.  However,  in  our  experiment 
we  tried  to  use  long  enough  signal  (one  dimensional  scan 
of  the  image)  and  lowpass  filter  the  signal  before 
processing. 

3.  TEXTURE  SEPARATION 

In  our  specific  implementation,  let  one  signal  x(n)  to 
pass  through  two  LTI  channels  h/n),  h/n)  and  result  in 
x/n),  x2(n).  We  then  use  above  algorithm  to  restore  x(n). 
There  are  some  prerequisites: 

i.  the  channels  h/n )  and  h2{n)  are  finite-duration 
impulse/response  sequences; 

ii.  there  are  no  zero-pole  cancellation  between  X(Z) 
and  the  channels  Hj(Z)  and  H2(Z)\ 

iii.  Hi(Z)  and  H/Z)  have  no  common  zeros. 

Petropulu  extended  above  arguments  to  non-linear  signals 
too  [7][8].  The  reconstruction  process  operates  on  row- 
by-row  of  the  image  and  restores  the  original  texture  from 
the  overlapping  ones.  Obviously  the  difference  between 
the  original  mixed  texture  and  the  final  reconstructed  one 
is  expected  to  be  mainly  in  the  overlapping  region. 

dj(n)=  x'j(n)  -x’/n-l)  (15) 

where  j  denotes  the  data  segment  which  can  be  a  row  of 
the  image,  d/n)  varies  smoothly  and  with  low  amplitude 
if  there  is  no  change  in  the  texture.  At  the  overlapping 
section  it  introduces  a  remarkable  change.  Obviously, 
dj(n)  is  not  sensitive  to  Gaussian  white  noise  since  HOS 
of  noise  tends  to  zero. 

Different  combination  of  patterns  yield  different 
measures  for  d/n).  a  limited  number  of  regions  introduce 
a  number  of  distinct  clusters  for  d/n).  A  differential 
competitive  learning  (CL)  neural  network  has  been  built 
up  to  cluster  above  dj(n)s.  The  network  is  similar  to  the 
traditional  Kohonen  unsupervised  NN,  except,  the  winner 
neurons  are  defined  as  those  whose  current  and  one  level 
previous  values  are  above  a  threshold  level.  The  weights 
to  the  winner  and  its  two  adjacent  neurons  are  updated. 
This  highly  avoids  the  effect  of  non-Gaussian  noise  in  the 
texture  and  idle  spikes  in  dj(n). 


4.  EXPERIMENTAL  RESULTS 

Combination  of  various  Brodatz  textures  and  their 
mixtures  has  been  used  to  show  the  performance  of  the 
proposed  algorithm.  Figure  1  represents  a  combination  of 
two  Brodatz  textures  in  which  one  is  partly  overlapped  by 
another.  The  values  of  d/n)  have  been  measured  for  the 
256x256  image  of  Figure  1.  The  image  is  scanned  line  by 
line  and  d/n)s  are  measured  for  each  sliding  window  at  J 


Figure  1.  A  Brodatz  texture  overlapped  with  another 
pattern 


Figure  2.  C/m )  for  two  different  regions  (mixed)  partly  by  the 
other  texture 

positions.  The  value  of  p  has  been  considered  to  be 
constant  equal  to  8.  Therefore  only  16values  for  each 
d/n)  (i.e.  n  =  -8  to  8.  n  *0)  have  been  used.  The  number 
of  clusters  is  initially  set  based  on  the  desired  number  of 
regions.  If  there  is  no  prior  knowledge  about  the  number 
of  regions,  the  number  of  peaks  in  d/n)s  can  be 
considered  as  the  maximum  number  of  clusters. 
However  in  majority  of  cases  where  there  is  only  one 
background  texture,  the  number  of  outputs  can  be 
tentatively  set  to  2.  Figure  2  shows  C/m)  for  two 
different  regions  of  the  image.  Finally  in  Figure  3  a  and  b 
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The  segmented  regions  before  and  after  post-processing  4.  CONCLUSIONS 

are  depicted. 


Figure  4  represents  another  pattern.  In  this  figure,  part  of 
the  image  has  been  covered  with  another  texture.  Figure  5 
illustrates  the  texture  after  reconstruction  process.  Figure 
6. a  and  b  show  the  boundary  of  the  contaminated  area 
before  and  after  post-processing  respectively. 


(a)  (b) 


Figure  3.  The  boundary  detected  before  (a)  and  after  (b)  post¬ 
processing 


Figure  4.  The  mixed  pattern  Figure  5.  The  reconstructed 

Texture 


a  b 


Figure  6.  The  boundary  of  the  contaminated  region  a.  before 
and  b.  after  post-processing 


Unlike  frequency-based  methods  which  are  sensitive  to 
noise  components  the  proposed  method  avoids  the  noise 
effect  in  two  different  stages.  In  the  first  stage  the 
components  of  WGN  are  eliminated  in  HOS  of  the  signal. 
In  he  second  stage  a  differential  updating  the  weights  in 
the  modified  CL-NN  suppresses  the  remaining  noise  and 
also  enhances  the  performance  of  the  system.  The  method 
overcomes  some  shortcoming  of  the  time-based  systems 
that  mainly  handle  distinct  uniform  patterns.  In 
application  of  wavelet  in  texture  segmentation  definition 
of  the  frequency  of  the  basic  function  is  one  of  the  major 
issues.  This  requires  a  rough  prior  knowledge  about  the 
image  variations.  Application  of  HOS  in  image 
segmentation  is  superior  to  above  established  methods 
especially  when  a  uniform  pattern  follows  by  a  region  of 
the  mixed  pattern.  The  mixture  includes  the  original 
image.  In  fact  the  original  pattern  has  been  somehow 
used  in  extraction  of  the  required  prior  knowledge. 
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ABSTRACT 

This  paper  describes  a  rate-distortion  (R-D)  optimal 
scheme  for  bit-plane  based  quantisation  of  complex  coef¬ 
ficients,  which  is  suitable  for  zerotree  image  coding  sys¬ 
tems.  Most  zerotree-type  image  codecs  operate  on  real¬ 
valued  wavelet  coefficients.  The  Dual-Tree  Complex  Wave¬ 
let  Transform,  which  has  several  advantages  over  the  dis¬ 
crete  wavelet  transform,  produces  complex  coefficients.  Our 
scheme  offers  progressive  bit-by-bit  refinement  of  coeffi¬ 
cient  magnitude  and  phase  values.  It  ensures  that  refine¬ 
ment  decisions  always  maximise  the  expected  distortion  de¬ 
crease. 

1.  INTRODUCTION 

Zerotree-type  (ZT-T)  coding  systems,  e.g.  [1]  [2],  are  well- 
known  for  providing  efficient  and  effective  image  compres¬ 
sion.  They  belong  to  a  larger  class  of  bit-plane  based  sys¬ 
tems  that  implicitly  quantise  data  values  as  a  consequence 
of  the  encoder's  execution  path.  The  coded  bitstreams  pro¬ 
duced  by  these  systems  are  often  progressive  and  embed¬ 
ded. 

ZT-T  codecs  usually  operate  on  a  discrete  wavelet  trans¬ 
form  (DWT)  of  an  image.  They  exploit  the  multiscale  obser¬ 
vation  that,  when  a  small  (insignificant)  wavelet  coefficient 
appears  in  a  coarser  level,  the  coefficients  in  the  same  spa¬ 
tial  locations  of  the  finer  scales  are  likely  also  to  be  insignif¬ 
icant.  These  codecs  are  more  efficient  when  coefficients  in  a 
local  neighbourhood  all  have  similar  magnitudes.  The  shift 
invariance  of  a  transform's  response  to  image  features  in¬ 
creases  the  likelihood  that  coefficient  magnitudes  will  be 
locally  correlated  between  and  within  scales. 

ZT-T  codecs  are  considered  close  to  optimal  in  a  rate- 
distortion  (R-D)  sense  for  scalar-quantised  (SQ)  real  values. 
Because  the  overwhelming  majority  of  wavelet  transforms 
produce  real-valued  coefficients,  most  literature  analysing 
the  rate-distortion  performance  of  ZT-T  codecs,  and  pro¬ 
posing  improvements,  tends  to  focus  on  real-valued  data. 

Some  notable  complex  wavelet  transforms  are  Daubech- 
ies’  complex  wavelets  [3]  and  the  Dual-Tree  Complex  Wave¬ 


let  Transform  (DT-CWT)  [4]  [5],  both  of  which  are  redun¬ 
dant  for  real  image  data.  The  DT-CWT  is  a  perfect  recon¬ 
struction  transform  with  Gabor-like  filters.  It  uses  two  trees 
per  dimension,  each  with  short.  Unear  phase  real  lowpass 
and  highpass  filters,  to  simulate  a  single  complex  lowpass/- 
highpass  filter  pair.  The  filters  in  the  two  trees  of  [5]  are 
just  the  time-reverse  of  each  other,  as  are  the  analysis  and 
reconstruction  filters.  For  2  dimensional  signals,  the  DT- 
CWT  has  4:1  redundancy. 

The  DT-CWT  has  several  advantages  over  the  conven¬ 
tional  critically-sampled  DWT.  It  has  good  directional  se¬ 
lectivity  in  multiple  dimensions,  and  can  distinguish  be¬ 
tween  positive  and  negative  signal  frequencies.  Significantly, 
the  magnitude  response  of  the  DT-CWT  is  approximately 
shift  invariant.  This  is  a  very  beneficial  property  for  ZT-T 
coding  systems  which  rely  on  local  interscale  and  intrascale 
correlations  of  wavelet  coefficient  magnitudes. 


real  component  magnitude  component 


Fig.  1.  Comparison  of  complex  components 

Generally,  complex-valued  data  is  quantised  using  scalar 
quantisers  independently  on  the  data's  real  and  imaginary 
components,  or  using  some  form  of  vector  quantisation  (VQ), 
such  as  trellis  coded  quantisation.  The  problem  when  quan¬ 
tising  complex-valued  wavelet  coefficients  within  a  ZT-T 
codec  is  the  lack  of  correlation  between  related  coefficients 
when  expressed  as  real  and  imaginary  components.  (‘Re¬ 
lated'  refers  to  the  family  tree  structure  imposed  on  coeffi¬ 
cients  by  ZT-T  systems.)  For  example,  the  real  and  imagi¬ 
nary  components  of  DT-CWT  coefficients  exhibit  shift  de¬ 
pendence  like  DWT  coefficients. 
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Figure  1  illustrates  the  enhanced  local  correlations  among 
the  magnitudes  of  complex  coefficients,  compared  to  the 
magnitudes  of  the  real  components  only.  From  the  2nd  level, 
vertical  subband  of  the  DT-CWT  transform  of  the  256x256 
‘Lena'  image,  outlines  of  Lena's  hair,  hat,  and  mirror  are 
visible.  The  enhanced  local  correlations  are  evident  from 
the  smoother,  more  continuous  lines  and  edges  in  the  mag¬ 
nitude  component  image. 

The  purpose  of  this  paper  is  to  develop  a  bit-plane  based 
quantisation  scheme  for  complex  coefficients,  suitable  for 
ZT-T  codecs,  that  is  optimal  in  an  R-D  sense. 

2.  OPTIMAL  R-D  QUANTISATION 

The  magnitudes  of  related  DT-CWT  coefficients  are  well 
correlated.  They  are  therefore  more  appropriate  for  ZT-T 
codecs  than  the  individual  magnitudes  of  the  real  and  imag¬ 
inary  components.  To  extend  a  ZT-T  codec  for  complex 
coefficients,  the  coefficients’  magnitudes  could  be  the  basis 
for  the  codec's  significance  threshold  tests  ( i.e .  the  ‘Sorting 
Pass'  of  the  SPIF1T  algorithm  [2]).  The  individual  magni¬ 
tudes  of  the  real  and  imaginary  components  could  then  be 
refined  one  bit  each  for  each  subsequent  level  of  the  algo¬ 
rithm  (i.e.  SPIHT’s  ‘Refinement  Pass’).  The  obvious  inef¬ 
ficiency  here  is  that,  if  one  component  is  much  larger  than 
the  other,  many  0  bits  are  used  to  describe  the  insignificant 
component. 

Instead,  we  propose  refining  coefficients’  magnitude  and 
phase  components  individually.  The  coefficients’  magni¬ 
tudes  are  used  for  the  ZT-T  algorithm’s  significance-based 
tests  and  decisions.  As  with  real-valued  coefficients,  once 
a  coefficient  is  found  to  be  significant  compared  to  the  cur¬ 
rent  threshold,  its  magnitude  is  refined  by  one  bit  at  each 
subsequent  threshold.  We  assume  that  the  usual  threshold¬ 
ing  system  applies;  i.e.  all  thresholds  can  be  expressed  as 
powers  of  2,  and  if  the  threshold  at  level  k  is  tk,  then  the 
threshold  during  the  next  pass  is  tk-i  =  tk /2. 

Obviously,  the  phases  of  significant  coefficients  must  be 
refined  concurrently  with  the  magnitude  refinement.  The  is¬ 
sue  here  is  the  determination  of  how  many  phase  refinement 
bits  to  process  at  each  level.  We  use  an  R-D  approach. 

Let  x  =  rej°  be  the  true  value  of  a  complex  coeffi¬ 
cient,  and  Xfc,/  =  fke^e‘  be  its  quantised  (i.e.  estimated  or 
reconstructed)  value  at  level  k  of  the  algorithm.  The  l  sub¬ 
script  denotes  that  fact  that  phase,  unlike  magnitude,  is  not 
necessarily  refined  1  bit/level.  We  choose  the  squared  error 
(square  of  the  I2  norm)  as  our  distortion  measure: 

Dx(r,6;fk,0i )  =  (rk  cos§i-rcos6)2  +  (rk  sin0(-r  sin#)2 

The  change  in  distortion  if  the  next  refinement  bit  pro¬ 
cessed  for  x  is  a  magnitude  refinement  bit,  or  phase  bit  re¬ 
spectively,  is: 


ADk_u  =  Dx(r,9;fk,0i)  —  DK(r,0;fk-i,6i) 

=  Dx(r,9;fk,9i)  -  Dx(r,9;fkJi- 1) 

Let  the  rate  changes  associated  with  the  above  distortion 
changes  be  Af4_ij(  and  ARkj-i.  The  codec  should  pro¬ 
cess  phase  refinement  bits  for  x,  before  the  next  refinement 
bit,  while: 

E[ADk,i-i\  E[ADk„u] 

EiARkj-d  E[ARk.u]  j 

Without  entropy  compression  of  the  coded  bitstream, 
the  rate  change  due  to  phase  or  magnitude  refinement  is  ex¬ 
actly  1  bit.  Equation  (1)  reduces  to  the  question  of  whether 
increasing  the  phase  quantisation  precision  by  1  bit  is  ex¬ 
pected  to  result  in  a  larger  distortion  decrease  than  increas¬ 
ing  the  magnitude  quantisation  precision  by  1  bit.  This  is 
the  strategy  of  our  R-D  based  complex  quantiser.  Before 
the  next  algorithm  level  -  when  the  next  magnitude  refine¬ 
ment  bit  will  be  (de)coded  -  process  phase  refinement  bits 
while  they  give  greater  expected  distortion  decreases  than 
the  next  magnitude  refinement  bit. 

2.1.  Quantisation  cells 

Before  proceeding  to  calculation  of  the  expected  distortions, 
let  us  briefly  consider  the  geometry  of  the  2-D  quantiser  we 
propose.  Assume  that,  at  level  k,  coefficient  x  is  newly  sig¬ 
nificant,  i.e.  tk  <  x  <  tk+i .  Without  any  phase  informa¬ 
tion,  the  decoder  knows  only  that  x  lies  in  the  ring  with 
inner  radius  is  tk  and  outer  radius  tk+ 1 .  With  1  bit  of  phase 
information,  the  range  of  possible  values  of  x  is  half  of  the 
ring;  with  2  bits,  the  range  is  a  quarter  of  the  ring  (figure  2), 
etc.. 

The  next  refinement  bit  for  x  received  by  the  decoder  re¬ 
duces  the  ring  range  segment  (i.e.  quantisation  cell)  to  one 
of  the  four  overlapping  segments  shown  in  figure  2.  The  en¬ 
coder  knows  the  true  distortion  change  associated  with  each 
segment,  and  can  therefore  decide  to  send  the  refinement  bit 
that  is  R-D  optimal.  However,  unless  the  decoder  can  follow 
the  same  decision  paths  as  the  encoder,  the  encoder  must  in¬ 
clude  decision  overhead  information  in  the  coded  bitstream. 
Any  gains  from  using  the  R-D  optimal  bits  are  easily  offset 
by  the  cost  of  the  overhead  [6]. 

The  decoder  can  calculate  the  expected  distortion  over 
each  of  the  four  segments,  and  decide  which  one  offers  the 
greatest  expectation  of  distortion  reduction.  The  encoder 
must  use  the  same  decision  rule  to  determine  which  type  of 
refinement  bit  to  code. 


481 


I 


Fig.  2.  Quantisation  cells  in  complex  plane 


2.2.  Expected  distortion 

Calculation  of  expected  distortions  over  a  quantisation  cell 
requires  the  magnitude  and  phase  joint  probability  distribu¬ 
tion  p(r,  9).  Actually,  because  we  are  concerned  with  the 
pdf  only  after  the  coefficient  has  become  significant,  when 
we  know  the  most  significant  bit  (MSB)  of  its  magnitude, 
we  want  the  joint  conditional  distribution: 


p(r,9  |  2^rJ  <r  <2ri°s*rl) 


E[Dx(r ,  9:  f ,  0)]J’;S  =  f  / 'p(r,  9)Dx(r ,  9 ;  r ,  9)  dr  dO 
Jo  Jt  1 

3A  t.  a 

(sin(a  —  9)  4-  sin0)  (2) 

At  =  to  -  U 

The  reconstruction  values  f  and  9  should  minimise  the 
expected  distortion;  i.e.  x  should  be  the  centroid  of  the  ring 
segment  bounded  by  r  £  [ti,t2)  and  9  £  [0,a).  For  the 
squared  error  distortion  measure,  the  centroid  is  simply  the 
expectation  of  x,  given  that  x  lies  in  the  ring  segment  above 
[7],  Therefore,  the  optimal  reconstruction  estimates  are: 


9  =  a/2 

f  =  —  (to  +  fi)(sin(o  -  9)  +  sin (9) 

2a 

=  ^  (*2  +  *i)sinc(a/2) 

Using  the  optimal  reconstruction  estimates  above,  equa¬ 
tion  (2)  simplifies  to: 


E{DAr,9;r,9)} 


1 2.0 
/i  .o 


(*?-*?) 
3A  t 


i(f2  +  f|)sinc(a/2) 

(3) 


The  encoder  and  decoder  can  use  (3)  to  calculate  the 
expected  distortions  of  the  four  new  quantisation  cells  that 
result  from  processing  another  refinement  bit  for  x.  Since 
the  two  possible  cells  that  result  from  a  phase  refinement  bit 
have  the  same  distortions,  the  JS'fAZ)/.../— t  ]  term  from  (1) 
can  be  re-written: 


where  J  and  []  denote  the  floor  and  ceiling,  respectively. 

For  the  moment  we  shall  use  the  simple  and  not  unre¬ 
alistic  assumptions  that  r  and  9  are  independent,  and  uni¬ 
formly  distributed.  Our  magnitude  model  therefore  assumes 
that  while  there  is  high  correlation  amongst  the  MSBs  of  re¬ 
lated  coefficients,  all  lesser  bits  are  uncorrelated  -  in  fact, 
independent  -  and  equiprobable. 

The  radial  width  of  the  ring  is  the  decoder's  uncertainty 
in  the  magnitude  of  x.  The  angular  width  a  of  the  ring  range 
segment  is  the  decoder's  uncertainty  in  the  phase  of  x.  Note 
that,  when  p(9)  is  assumed  uniform,  the  phase  uncertainty 
is  independent  of  the  true  phase  value.  Consequently,  to 
simplify  expected  distortion  calculations,  we  can  treat  all 
ring  range  segments  as  being  bounded  between  angles  0  and 
a. 

The  expected  distortion  for  any  ring  segment  quantisa¬ 
tion  cell,  such  as  the  cells  shown  in  figure  2,  is: 


E[ADkJ-i]  =  E[Dx(r,9;f,9)]^a0-  £[Dx(r,0;M)]^/2 

The  two  possible  cells  that  result  from  processing  a  mag¬ 
nitude  refinement  bit  do  not  have  the  same  distortions.  Since 
the  next  magnitude  bit  is  0  or  1  with  equal  probability,  the 
E[A£>a._i./]  term  from  (1)  can  be  re-written: 

E[ADh-lj)  =  E{Dx(r,e;r,6)],t^- 

\(E[Dx{r,  9 ;  f ,  9)}f^a  +  E[Dx(r,  9;  f,  $)]£"„) 

3.  DISCUSSION  AND  RESULTS 

3.1.  SQ  zerotree  coding  of  complex  coefficients 

We  implemented  two  complex-coefficient  extensions  of 
SPIHT.  The  coefficients  are  generated  by  applying  the  2D 
DT-CWT  to  an  input  image  to  ensure  good  correlation  be¬ 
tween  the  magnitudes  of  related  coefficients.  In  one  sys- 
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tem,  the  coefficients  are  separated  into  their  real  and  imag¬ 
inary  components,  and  are  treated  as  two  separate  pixels  in 
SPIHT’s  lists  of  insignificant  and  significant  pixels.  Each 
2x2  neighbourhood  of  coefficients,  as  defined  by  the  SPIHT 
family  tree  structure,  contains  8  pixels,  and  has  2  parents 
(one  real  and  one  imaginary).  The  magnitudes  of  the  com¬ 
plex  coefficients  are  used  for  pixel  and  set  significance  tests. 

The  second  system  separates  coefficients  into  magni¬ 
tude  and  phase  components.  It  implements  the  R-D  based 
phase  refinement  bit  versus  magnitude  refinement  bit  deci¬ 
sion  rule  described  in  this  paper.  All  of  the  lists  and  tree 
structures  are  the  same  as  in  SPIHT,  with  an  extra  list  to 
manage  phase  refinement  information.  For  low  bit  rates, 
initial  tests  show  the  second  system  provides  up  to  0.5  dB 
PSNR  improvement  over  the  first  system  at  the  same  bit 
rates. 

Figure  2  shows  the  coding  performance  of  the  two  sys¬ 
tems  described  above  when  applied  to  8-bit  512x512  ‘Lena' 
and  ‘Peppers’  images.  Because  of  the  DT-CWT’s  4: 1  redun¬ 
dancy,  the  performance  curves  in  fig.  2  lie  a  few  dB  below 
those  achievable  using  critically-sampled  DWTs.  We  are 
currently  investigating  methods  to  realise  fully  the  coding 
gains  the  DT-CWT's  properties  should  provide. 

Lena  Peppers 


Fig.  3.  Comparison  of  quantisers'  performance 

. real  -  imaginary  based  quantiser 

- magnitude  -  phase  based  quantiser 

3.2.  VQ  zero  tree  coding  of  complex  coefficients 

Several  successful  codecs  use  vector  quantisers  within  a  ZT- 
T  framework.  Because  tree-structured  VQ  and  multistage 
VQ  offer  progressive  refinement  of  codewords  they  are  nat¬ 
ural  choices,  although  the  bitstreams  they  produce  are  not 
fully  embedded  and  progressive.  The  most  significant  prob¬ 
lem  with  VQ  ZT-T  systems  is  that  they  are  very  difficult  to 
optimise  in  an  R-D  sense.  Within  a  given  significance  level 
(and  even  between  levels),  bits  are  often  spent  refining  vec¬ 
tors  with  little  reduction  in  overall  distortion  when  those  bits 
would  be  better  spent  elsewhere. 

With  energy-normalised  wavelet  transforms,  a  few  large 
magnitude  coefficients  possess  much  of  the  energy  of  the 
transformed  data.  These  coefficients  are  difficult  to  code 


efficiently  with  a  vector  quantiser.  (The  space  of  possible 
vectors  is  too  large,  and  the  number  of  realised  vectors  in 
a  given  data  set  is  too  small.)  We  are  developing  a  hy¬ 
brid  SQ/V Q  SPIHT-like  codec  which  combines  the  the  sys¬ 
tem  described  in  this  paper  with  regular  VQ  coding.  The 
smaller  coefficients  are  gathered  into  multidimensional  vec¬ 
tors  and  quantised  with  a  tree-structured  vector  quantiser. 
The  largest  coefficients  are  quantised  using  the  progressive 
refinement,  R-D  based  system  described  in  this  paper. 

4.  CONCLUSIONS 

Most  ZT-T  codecs  deal  with  real-valued  wavelet  coefficients. 
Transforms  that  produce  complex  coefficients,  such  as  the 
DT-CWT,  can  offer  desirable  properties,  such  as  shift-invari¬ 
ance.  However,  extension  of  bit-plane  based  coding  to  com¬ 
plex  coefficients  is  not  straightforward.  We  developed  an  R- 
D  optimal  strategy  for  progressive  bit-by-bit  refinement  of 
magnitude  and  phase  values.  By  calculating  expected  dis¬ 
tortion  changes,  the  encoder  and  decoder  can  make  the  same 
decisions  without  the  need  for  overhead  bits.  The  decision 
to  code  a  magnitude  or  phase  refinement  bit  is  determined 
by  which  type  of  bit  maximises  the  expected  distortion  de¬ 
crease.  We  are  investigating  more  sophisticated  magnitude 
pdf  models,  for  instance  pdfs  conditioned  on  the  magnitudes 
of  a  coefficients'  neighbours  and  parent. 
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ABSTRACT 

This  paper  introduces  the  irregular  sampling  problem  as¬ 
sociated  with  motion  transformations  embedded  in  image 
sequences.  Moving  patterns  in  image  sequences  undergo  a 
sampling  which  is  function  of  the  relative  position  of  the  ob¬ 
ject  and  the  sampling  grid.  To  solve  this  problem,  it  is  effec¬ 
tive  to  consider  motion  as  a  smooth  invertible  time- warping 
transformation.  Important  applications  are  related  to  this 
topic.  Let  us  mention  the  focalization  on  selected  mov¬ 
ing  areas  characterized  by  a  specific  scale  and  a  specific 
kinematic.  Focalization  and  selective  reconstruction  can  be 
performed  either  for  analysis  purpose  with  interpolation, 
prediction,  and  de-noising  or  for  coding  purpose  with  trans¬ 
mission  of  limited  areas  of  interest.  The  Shannon  sampling 
theorem  and  its  generalizations  as  Kramer  and  Parzen  the¬ 
orems  apply  in  this  context  with  Clark’s  theorem.  Clark’s 
theorem  shows  that  signals  formed  by  warping  band-limited 
signals  admit  formulae  for  reconstruction  from  samples. 
Furthermore,  in  this  paper,  the  warping  operators  that  lift 
the  pattern  up  to  a  trajectory  are  chosen  as  unitary  irre¬ 
ducible  and  square-integrable  group  representations.  These 
operators  bring  important  tools  to  motion-selective  analy¬ 
sis  and  reconstruction,  namely  continuous  wavelets,  frames, 
discrete  wavelet  transforms,  and  reproducing  kernel  sub¬ 
spaces.  In  this  paper,  two  examples  are  treated  with  mo¬ 
tion  at  constant  translational  velocity  and  angular  velocity. 
It  is  shown  that  the  analysis  and  reconstruction  structures 
directly  derived  from  motion-based  groups  are  equivalent  to 
warping  the  same  structures  from  the  usual  affine  multidi¬ 
mensional  group  defined  for  space-time  transformations. 

Key  Words:  wavelets,  motion  detection,  classifi¬ 
cation  and  reconstruction,  signal  and  system  mod¬ 
eling. 

1.  INTRODUCTION 

In  this  paper,  the  motion  transformations  that  occur  in 
space-time  signals  like  image  sequences  1!  x  R  are  char¬ 
acterized  as  a  smooth  (i.e.  differentiable)  warping  of  the 
spatio-temporal  space  which  lifts  a  still  signal  into  a  mov¬ 
ing  signals  on  a  trajectory.  As  the  object  is  moving  from 
one  frame  to  the  next,  its  sampling  is  irregular  except  in 
the  particular  case  where  the  displacements  correspond  to 
an  integer  number  of  samples  in  the  grid.  The  approach 
developed  in  this  paper  will  refer  to  the  Shannon  sampling 
theorem  and  its  generalization  to  Parzen’s  theorem  for  mul¬ 
tidimensional  signals  and  to  Kramer’s  theorem  as  a  general 
integral  transform  [1,  2,  3,  4],  Clark’s  theorem  is  also  rele¬ 
vant.  Indeed,  Clark’s  theorem  states  that  signals  obtained 
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by  warping  band-limited  signals  admit  formulae  for  recon¬ 
struction  from  samples  [5,  6]. 

In  this  paper,  the  warping  models  rely  on  the  physi¬ 
cal  structure  of  motion  which  involves  Lie  algebras  or  Lie 
groups.  The  warping  operators  arc  in  fact  constructed  as 
a  Lie  group  representations  i.e.  operators  in  the  Hilbert 
space  H  =  L2(E2  x  E , dkdu)  of  the  signals,  k  and  u> 
stand  respectively  for  the  spatial  and  temporal  frequency. 
This  technique  relates  these  operators  to  important  analysis 
tools.  One  of  these  tools  consist  in  multi-dimensional  affine 
wavelets  which  correspond  to  the  affine  group  of  dilations 
in  space  and  translations  in  space-time.  The  deformation  of 
the  affine  group  into  a  group  of  motion  induces  a  warping  of 
the  continuous  wavelets,  frame  and  discrete  wavelets.  These 
warping  transformations  require  to  be  generated  by  invert¬ 
ible  operators,  to  compose  one  with  the  other  and  to  pre¬ 
serve  the  band-limitedness  of  the  still  signals.  How  to  build 
such  warping  operator?  The  answer  stays  in  the  following 
choice.  The  Lie  group  Representations  [11,  12,  13]  are  Uni¬ 
tary  Irreducible  (UIR.)  operators  in  H  =  L2(W  x  K,  dkduj) 
defined  from  a  group  homomorphism  i.e.  a  on-to-one  map¬ 
ping  from  the  group  element  g  6  G  to  operator  II, j  in  the 
Hilbert  space  H  =  L2(E2  x  E,  dkdeo):  g  G  G  — t  R,  such 
that  nyin92  =  n?]OS2  and  n^-i  =  T~';  then  ne  =  In- 
Moreover,  when  this  warping  operator  is  square-integrable, 
it  provides  a  strong  structure  for  signal  analysis,  decomposi¬ 
tion  and  reconstruction.  This  structure  is  made  of  Continu¬ 
ous  Wavelet  Transform  (CWT)  and  Frames  Operator  (FO) 
along  with  Reproducing  Kernel  Spaces  (RKS),  Discrete 
Wavelet  Transforms  (DWT)  or  Orthonormal  Bases  (ONB), 
and  in  a  weaker  sense,  Riesz  bases.  Square-integrable  warp¬ 
ing  operators  preserve  the  signal  band-limitedness.  Two  ex¬ 
amples  of  motion  warping  are  considered  in  this  paper.  The 
first  concerns  translational  velocity  v  (called  Galilean  trans¬ 
formation)  [12,  13]  and  the  second  introduces  the  angular 
velocity  8 1  [11].  For  the  time-warpings  defined  as  above, 
this  paper  shows  that  diagrams  like  the  following  commute. 
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This  diagram  means  that  the  analysis  and  reconstruction 
structures  (CWT,  FO,  bases)  directly  derived  from  motion- 
based  groups  are  equivalent  to  warping  the  same  structures 
defined  from  the  usual  affine  multidimensional  group  for 
space-time  transformations.  This  scheme  generalizes  up  to 
general  forms  of  motion  (deformational  motion  and  motion 
on  manifolds)  as  long  as  the  warping  satisfies  unitary,  irre- 
ducibility  and  square-integrability.  This  paper  also  shows 
how  signals  formed  by  the  warping  of  band-limited  sig¬ 
nals  admit  different  kinds  of  reconstruction  formulas  from 
samples.  This  theory  extends  beyond  the  limited  scope  of 
this  presentation  to  consider  singular  self-adjoint  boundary- 
value  problems  also  known  as  Sturm-Liouville  boundary- 
value  problem  related  to  generalized  special  functions  char¬ 
acterizing  motion  transformations. 


I  -¥  C.  Then,  the  Kramer  space  associated  with  I  and  K 
consists  of  all  the  signals  of  the  form 

/(x)  =  K(x,k)  /(k)dk  (4) 

where  /( k)  £  L2( I,  dk).  If  there  exists  a  countable  set  E  = 
{xn}  such  that  {A'(x, k)}  is  a  complete  orthogonal  set  on 
L2( I,  dk),  then  we  have  the  following  reconstruction  formula 

+n 

/(x)  =  lim  /(xn)  S„(x)  (5) 

n— ^oo  4 — ^ 

—  n 

where 


2.  WARPING  AND  SAMPLING  THEOREMS 

This  section  first  defines  time-warping  operators  and  pro¬ 
ceeds  to  three  related  sampling  theorems  [5,  6].  The  warp¬ 
ing  transformation  is  defined  as  a  space-time  mapping  7: 
D  =  (R2  xl)  4  D-y  =  (R2  x  R).  This  mapping  acts  on 
band-iimited  functions  /  defined  in  image  sequences.  The 
appropriate  space  for  these  finite-energy  functions  is  the 
Hilbert  space  H  =  L2(R2  x  R,  dxdt).  The  composition 
(/  07)  defines  a  warping  operator  [ll7/](x)  =  /(7(f))  with 
/  £  H  and  x  £  (R2  x  R).  In  the  following,  the  notation  x 
stands  for  space-time  variables,  x  for  space  vectors,  and  t 
for  time  i.e.  x  =  (x,  t).  k  stands  for  k  =  ( k,ui ). 


S„(x)  =  S(xn,x) 


fj  K(x, k)  K(xn,k)  dk 
fj  |K(x„,k)|2  dk 


If  the  kernel  K  is  chosen  as  a  Fourier  kernel,  Kramer’s  the¬ 
orem  retrieves  Shannon’s  theorem.  Let  us  proceed  further 
on  RKS. 


A  basis  {Tn}  is  a  sampling  basis  for  a  Reproducing 
Kernel  Hilbert  Space  (RKHS)  H  [6]  with  sampling  set 
{rn  £  D}  yields  a  reconstruction  formula 

=  »)*"(*)  V/£tf  (7) 


The  Shannon  sampling  theorem  has  been  generalized  by 
Parzen  [4]  for  multi-dimensional  signals,  then  in  space-time. 
Theorem  I  If  I  is  a  bounded  interval  symmetric  to  origin 
defined  as  a  the  spatio-temporal  frequency  torus  I  =  jf  x  It 
and  /(x)  is  band-limited  in  I  i.e.  f  |/(k)|2dk  <  00,  ten 


/(x) 


5^/(x  k) 


nsin[Wj(xj  -  njT ] 
Wi(xi-mT) 

i— 1,2,3 


(1) 


where  K  =  (ni,n2,n3),  m  €  Z ,  W,  =  jf-,  Is  = 

[— Wi,  +Wi],  i  =  1,  2,  and  It  =  [—  W3,  +W3].  Parzen  theo¬ 
rem  establishes  that  a  band-limited  function  /( k)  £  I  can 
be  completely  determined  by  giving  its  ordinates  on  a  grid 
of  points.  Clark’s  theorem  states  the  following. 

Theorem  2  If  D  admits  a  sampling  formula  as 


/(x)  =  y>XK)*K(x)  (2) 

K 


then,  Z)7  admits  a  sampling  formula  for  h  =  f  07 


h(x)  =  5>k)*k(x)  (3) 

K 


where  pk  =  7-1(xk)  and  <3>k  =  Tk  °  7.  When  7  is  an 
affine  transformation,  then  the  band-limitedness  of  f(t)  is 
preserved.  When  7  is  not  an  affine  transformation,  we  need 
additional  conditions  on  the  warping  operator  to  preserve 
band-limitedness.  In  the  following,  time-warping  operators 
are  derived  from  unitary  and  irreducible  square-integrable 
representations  of  groups  for  motion  transformation. 

The  Kramer’s  generalized  sampling  theorem  states: 
Theorem  3  Let  us  suppose  a  bounded  interval  I  defined  as 
above,  and  the  space  L2(I,  dk)  of  functions  f(x)  for  which 
fi  |/(k)|2dk  <  00  Let  us  further  suppose  the  existence  of 
a  kernel  K(x,  k)  £  L2(I,  dk)  for  all  x  £  R2  x  R  :  [R2  x  R]  x 


if  and  only  if  its  bi-orthogonal  basis  {Tn}  is  given  by 

Tn(x)  =  (Tn  ,  4-n)  K(®n,x)  (8) 

where  K(x n,x)  is  a  reproducing  kernel  for  the  functions 
/  £  H .  { ,  )  defines  the  inner  product.  A  reproducing  kernel 
K  for  H  is  such  that  K  :  D  x  D  C  with  K(x  1,  x2)  £  H 
for  all  xi,x2  £  D  and  f(x i)  =  fDK(x i,x2)  /(x2)dx2  for 
all  /  £  H. 


3.  CWT,  DWT  FOR  MOTION  PATHS 

This  section  restarts  from  the  definition  of  the  CWT  and 
warps  this  structure  along  motion  transformations.  The 
condition  of  square-integrability  imposed  on  the  group  rep¬ 
resentations  implies  the  existence  of  CWT  and  frames 
along  with  RKHS.  A  time-warping  is  applied  in  forms  of 
a  velocity-based  transformation:  it  generates  continuous 
Galilean  wavelets,  frames  and  RKHS.  This  defines  new 
CWTs  and  frames  along  the  path  of  a  constant  velocity 
transformation  i.e.  x  =  60  +  vr.  Similar  constructions  ap¬ 
ply  for  other  kinds  of  motion  like  rotation  at  constant  the 
angular  velocity. 

Let  us  recall  the  definition  of  the  Continuous  Wavelet 
Transform  (CWT).  Let  us  denote  S(x,t)  the  signal  in  the 
Hilbert  space  L2(Rn  xR,  dnxdt).  The  CWT  [W*S](p)  is  de¬ 
fined  as  a  linear  map  ITT  :  L2(R"  x  R,  dnxdt )  — t  L2(G,dg ) 

[WTS](g)  =<  'kg  ,  S  >=  /  dkdui  'kg(k,u})S(k,ijj)  (9) 

J  R"xR 

The  overbar  "  and ''symbols  denote  complex  conjugate  and 
Fourier  domain.  As  a  CWT,  this  linear  map  (9)  is  an  inner 
product  endowed  with  more  properties  than  an  usual  cross- 
correlation  function  since  it  enables  perfect  reconstruction 
from  the  inverse  CWT.  The  CWT  is  in  fact  an  isometry 
from  the  space  of  observation  H  =  L2(R2  x  R,  dxdt)  to 
a  subspace  H,,  C  L2(G,dg).  The  space  Hv  is  a  space  of 
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complex-valued  functions  on  G;  it  is  is  a  reproducing  ker¬ 
nel  space.  This  means  the  existence  of  a  reproducing  ker¬ 
nel  i.e.  the  autocorrelation  of  T.  The  reproducing  kernel 
K  is  such  that  G  x  G  ->•  C  with  K(g\,g2)  = 
ansd  f{g\ )  =  (K(gt,Q2),f(g2))  for  all  g  6  G.  The  link  be¬ 
tween  reproducing  kernel  spaces  and  sampling  theorem  has 
already  been  defined;  it  will  be  warped. 


The  n-dimensional  spatio-temporal  affine  group  [12],  de¬ 
noted  here  Gi),  is  an  ordered  4-tuple  of  elements  g  = 

{ b ,  r,  o,  R]  where  the  parameters  b  €  K" ,  r  €  R  a  € 
E"1  \{0}  and  R  €  SO(n )  stand  respectively  for  spatial  trans¬ 
lation  (the  Cartesian  position),  temporal  translation,  dila¬ 
tion  (the  scale)  and  rotation  in  n-dimensional  space  (the 
angular  orientation).  The  group  elements  in  G 1  have  the 
following  matrix  representation 


9 


a  R( A)  v  b 
0  Or 
0  0  1 


A  €  [0,  2tt)  v  -  0  (10) 


This  group  is  a  subgroup  of  GL(n  +  2.  M).  The  UIRs  of  G i 
are  eventually  given  as  operators  IIS:  ^(fcjUi)— >  [H,  i'}(k.  oj) 
as 

[n9$]( k,ui)  =  at  ei(f'S+WT)  5(fc',w)  (11) 

with 

k  =  a/r'ifc  (12) 

Prom  now  on,  we  consider  n  —  2. 

Theorem  4  The  UIRs  of  G\  in  the  space  L2( R2  xt  d2kdi j) 
are  square-integrable.  The  condition  of  square-integrability 

requires  that  6  L2(R" ,  dnx)  be  such  that 


€  R.  9\  is  the  angtilar  velocity  [11].  We  have  defined 
a  new  group  G 3  composed  of  ordered  6-tuple  of  elements 
g  =  {b,  r,  v,  Bo,  0, ,  a}.  The  UIRs  of  G3  in  L2(!2  x  M,  dkdw) 
are  expressed  as 

[T(ff)?mo]  (jfc,w)  =  an'2  e^’Ptrlt-S  +  «r]  $mo  (£  ',w') 

(18) 

where  k  and  u  are  as  in  Equation  15.  The  character 
e,[R(e,T)b  k+uT]  Qf  tjjC  ujj^s  jn  Equation  18  introduces  a  spe¬ 
cial  function  derived  by  integration  on  t.  This  yields  with 
and  polar  coordinates  k  =  ( k,a ),  b  =  (r,  ft) 

1  f2lr 

Jn(kr)  =  —  /  eHnu+kr»inu]du  (19) 

2?r 

which  is  not  a  Bessel  function  except  for  !1  6  Z. 

Let  us  recall  that  the  definition  of  a  frame.  A  sequence  of 
functions  { <t>j }  in  a  separable  Hilbert  space  is  called  a  frame 
if  there  exist  two  constants  A,B  >  0  and  B  <  o o  so  that, 
for  all  /  6  H,  we  have 

A  il/il2  <  Jj(/,  4>j)\2  <  B  H/ll2  (20) 

3 

where  the  sequence  of  functions  {<f>j}  is  computed  on  a 
discrete  lattice  j  derived  from  discretizing  the  group  pa¬ 
rameters  b,  r,  o,  Bo,  v.  Square-integrable  UIRs  imply  the 
existence  of  associated  frames.  Such  frames  have  an  as¬ 
sociated  invertible  bounded  operator  F  :  H  — »  H\  and 
F{f)  —  ^^  (/,  <f>„)(/>n.  This  frame  allows  a  perfect  recon¬ 
struction  and  a  sampling  theorem 


/ 

Jr 


l*«  ,*»)! 


d2£dg 


=  Cy  <  +  oc 


(13) 


R2xR 


and  the  representation  ns$  is  bounded  for  all  g.  The  vari¬ 
able  £  €  Mn  is  a  Fourier  variable.  Let  us  warp  the  group 
G 1  with  a  warping  parameter  v  €  K".  This  deformation 
defines  a  group  G2  called  the  Galilei  group  [13,  12]  made 

of  ordered  5-tuple  of  elements  g  =  {b,r,v,a,R}.  The  pa¬ 
rameter  v  €  is  in  fact  the  velocity.  This  group  is  still  a 
subgroup  of  GL(n  +  2,  M)  but  the  UIRs  read  now 


[n9$mo](fc,w)  =  of  e,<5-J+TW> $(*V)  (14) 


with 


k  '  =  a R  1  (fc  +  mov) 

J  =  (o>-  "*o1^112  -  v-k) 


(15) 


When  v  tends  to  0,  these  UIRs  tend  to  Equation  (11). 
Theorem  5  The  UIRs  of  G2  in  the  space  L2  (R2  x  R.  d2kduj) 
are  square-integrable.  The  condition  of  square-integrability 
requires  that  the  following  integral  be  finite 


/ 

Jr 


|4'mo(fc,u>)[2  Imo{k,oj)  dk  dw  <  00  (16) 


R2xR 


where  Imo(k,u>)  is  equal  to 

dk  dw  (17) 

It  is  clear  that  for  mo  ^  0,  v  =  0,  and  ui  =  ui  the  condition 
of  admissibility  for  G 2  (17)  is  equivalent  to  G 1  in  (13). 


/ 

Jr 


R2xR 


|fc[2  +  2mo(u  —  a ;  ) 


|fc|4  mo 


Let  us  apply  a  second  warping  with  A  as  a  warping  pa¬ 
rameter  A  =  [0o  +  0ir]mod  27t  with  0o  €  [0,  2-tt)  and 


/(X)  =  53</,  F~l(<Pn))  <t>n(x)  =  ,  4>n)  F~'[<Pr,(x)} 

n  n 

(21) 

where  F  1  is  the  inverse  or  dual  frame  operator  for  F. 

Discrete  wavelet  transforms  arc  well  defined  as  dyadic 
Multi-Resolution  Analysis  (MRA)  wavelets  T  [7].  Let  <j>  ■ 
K  -»  K  be  the  continuous  scaling  function.  14  is  a  RKHS 
with  A'., (r  1,2:2)  =  2s  J2n  4>(2sxi  -n)(j)(2’x2  -n).  The  basis 
{^(x  -  11)}  of  Vo  is  bi-orthogonal  to  {A'0(x,n)}. 

4.  SAMPLING  THEOREM  FOR  MOTION 
TRANSFORMATIONS 

This  section  shows  that  the  UIRs  deduced  in  Equations  (14) 
and  (18)  lead  naturally  to  a  Kramer  theorem  for  motion 
transformations.  Let  $  £  L2(I,dk)  and  I  be  the  spatio- 
temporal  frequency  torus  I  =  I2  x  It,  and  h(t)  =  'k  o  Ha. 
If  we  integrate  these  UIRs  (14,  18)  on  the  spatio-temporal 
frequency  torus  I,  we  get  in  each  case  a  generalized  Fourier 
transform  of  the  form  (a=l) 

/l(x)  =  77~\3  f  e<7tXl'k  dk  <22> 

(27r)3  ./ 1 

which  can  be  restated  as  a  Kramer’s  sampling  theorem 

fe(x)  =  jT  K(x,  k)  5(k)  dk  (23) 

Therefore,  motion-based  warped  spaces  admit  a  reconstruc¬ 
tion  formula  with  A'(x,  k)  =  e^)13  and  xn  =  7_1(n) 

where  the  basis  {A'(n,  fc)}  =  e!n  k  is  complete  on  the  spatio- 
temporal  torus  I.  The  interpolating  functions  s„  can  be 
derived  from  Kramer’s  theorem. 
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For  the  Galilei  group  of  constant  velocity,  the  transfor¬ 
mation  7[x]  is  a  matrix  Ax  where  A  is  of  the  form 

A  =  ^  ^  ^  ^  with  k  =  At  k  (24) 


where  In  is  the  n  x  n  unit  matrix.  For  the  rotational  motion, 
Equation  (19)  leads  to  an  integral  transform  of  the  form 


poo 

[Hnf](k)  =  /  f(r)  Jn(k,  r)  rdr  (25) 

Jo 

on  which  Kramer’s  theorem  applies  on  the  zeroes  of  the 
Bessel  functions  defined  for  fl  =  g-  G  Z  i.e.  when  a;  is  a 
multiple  of  the  angular  velocity.  To  avoid  any  aliasing,  we 
need  8i  <  1.  Hence,  for  signal  processing  Q  =  —1,0,1  do 
only  matter.  From  Section  2,  if  Q  =  m  £  Z,  then  Kramer’s 
theorem  has  the  sampling  function 


Sm,n  (&) 


JT  /w  T )  71 )  Tdv 

/0°°  fir)  I Jm(k  r) |2  rdr 

2  km>n  Jo(k) 


(26) 

(27) 


where  km,n  are  the  zeroes  of  the  Bessel  function  Jm  i.e. 
Jm(km,n)  —  0  and  usual  properties  of  Bessel  functions  have 
enabled  the  evaluation  of  the  integrals  in  (26). 


5.  WARPING  CWT,  FRAMES  AND  DWT 

In  Section  3,  an  invertible  motion-based  warping  transfor¬ 
mation  has  been  constructed  from  group  G\  to  Gi  and  then 
to  G3.  In  fact,  we  have  much  more  properties  to  state  on 
the  CWT,  frames  and  DWT. 


Unitary  irreducible  and  square-integrable  warping  pre¬ 
serves  the  inner  product  and  then  the  CWT.  Indeed,  if 
s,*  €  H,  fy  =  n7(/)  and  $7  =  n7(*91)  =  =  2,3 

the  inner  product  is  preserved  by  the  warping  such  that 

<s7,*7>  =  {s,V9l)  (28) 

This  means  that  the  motion-based  CWT  computed  on  a 
moving  signal  (rigid  pattern)  is  equivalent  to  computing 
the  multi-dimensional  affine  CWT  on  the  still/frozen  ver¬ 
sion  of  the  same  signal  (pattern). 


The  warping  of  the  multi-dimensional  affine  frame  com¬ 
puted  from  the  UIRs  (11)  gives  rise  to  the  same  frames  as 
computed  directly  from  the  CWT  of  the  correspond  motion- 
based  UIRs  i.e.  to  motion-compensated  frame,  and  convolu¬ 
tional  filters.  Motion-compensated  structures  are  defined  as 
structures  applied  on  the  assumed  trajectory  of  motion.  For 
the  frame  operator,  the  same  conclusions  as  for  the  CWT 
apply.  If  E  =  {4>j}  is  a  frame  for  Gi,  then  E-,  =  {n(</q)} 
is  a  frame  for  any  G,  with  rescaled  bounds.  Since  the  inner 
product  is  preserved,  the  reconstruction  process  delivers  a 
still  version  of  the  moving  signal  (pattern) . 


The  warping  of  the  multi-dimensional  affine  DWT 
and  its  MRA  do  NOT  give  rise  in  whole  generality  to 
DWTs  (or  ONBs)  on  the  corresponding  motion  group 
but  instead  gives  Riesz  bases  or  exact  frames.  To 
have  DWT  in  the  Galilei  group,  we  need  velocity  vec¬ 
tors  whose  components  provide  integer  translations.  For 
integer  velocities  and  discrete  group  parameters  g ,  as 
defined  in  [13],  the  Galilean  case  mimics  the  affine 
group  as  follows.  Let  a,  =  2  and  t)  = 

F  —  nbb »  —  nvv*(t  —  nTr»)  ,  t  —  n,r.)  where 

we  retrieve  the  ONBs 

=  2-m/’2\fr  (2 ~mx—p,t  —  q)  in  L2(E2  x  E) 

at  p  =  nbb,  +  nvv,nTTt,  q  =  nTr»  with  p  e  Z2,  and  q  €  Z. 


6.  CONCLUSIONS 

This  paper  has  shed  some  introductory  light  on  the  irregular 
sampling  problem  associated  with  motion  embedded  in  im¬ 
age  sequences.  Related  sampling  theorems  have  been  stated 
as  selective  reconstruction  formulae.  This  theory  extends  to 
more  general  statements  involving  involving  deformational 
motion,  motion  on  manifolds,  Sturm-Liouville  boundary 
problems  and  special  functions  out  of  which  interesting  and 
practical  derivations  will  be  presented.  The  applications 
of  motion-compensated  wavelets  in  de-noising,  interpolat¬ 
ing,  coding  and  transmitting  digital  image  sequences  have 
been  presented  as  efficient  schemes  in  [8,  10,  11,  12,  13].  Re¬ 
sults  on  motion-selective  reconstructions  from  digital  image 
sequences  will  also  be  presented  to  demonstrate  the  effec¬ 
tiveness  of  this  approach. 
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ABSTRACT 

In  this  paper,  the  perceptually  based  loss  functions  for  audio  filter¬ 
ing  used  by  Wolfe  and  Godsill  [1]  are  shown  to  fit  well  within  a 
complex-valued  Support  Vector  Machine  (SVM)  framework.  SVM 
regression  is  extended  to  estimation  of  complex-valued  functions, 
including  the  derivation  of  a  variant  of  the  Sequential  Minimal  Op¬ 
timisation  (SMO)  algorithm.  Audio  filters  are  derived  using  this 
based  on  an  autoregressive  (AR)  model  used  for  audio  and  two  dif¬ 
ferent  Hermitian  kernel  functions.  Results  are  found  to  be  promis¬ 
ing,  and  further  improvements  are  discussed. 

1.  INTRODUCTION 

Recent  attempts  to  design  audio  filters  based  on  perceptual  con¬ 
siderations  (see,  for  example,  [1-3]),  have  in  general  assumed  in¬ 
dependence  between  Discrete  Fourier  Transform  (DFT)  compo¬ 
nents.  Indeed,  this  simplifying  assumption  has  also  been  made  in 
standard  approaches  to  audio  signal  enhancement  [4], 

While  mathematically  convenient,  such  an  assumption  is  un¬ 
realistic.  This  can  be  shown  by  considering  a  common  standard 
speech  model:  the  autoregression  (AR).  Assume  that  the  signal  of 
interest  is  generated  by 

p 

y?i  ~  ^  i  — p  -( -  fin,  (1) 

p=i 

where  en  ~  A’(0,  aj,).  With  reference  to  Box  and  Jenkins  [5]  and 
Hopgood  [6], 

Y~V(0,WjvAW£),  (2) 

where  Y  is  an  iV-length  vector  of  DFT  elements,  A  is  related  to  of 
and  is  generally  not  diagonal,  and  the  AR  coefficients  a,  and  Wjv 
is  a  matrix  with  elements  Wjv(fc  +  l,n  +  l)  =  II'.v"  =  e~J  *  , 
k,n  G  {0, . . . ,  N  —  1}. 

Wolfe  and  Godsill  [  1  ]  consider  a  loss  function  based  on  masked 
thresholds,  e  a,  below  which  additive  noise  is  assumed  to  be  imper¬ 
ceptible  to  human  listeners: 

0  | in |  -  mi |  <  e». 

(in I  -  mi)  -  4  Otherwise. 

(3) 

*  Material  by  the  second  author  is  based  upon  work  supported  under  a 
U.S.  National  Science  Foundation  Graduate  Fellowship. 


This  results  in  an  estimation  of  the  magnitude  of  the  DFT  com¬ 
ponents;  in  the  absence  of  a  quantitative  perceptual  motivation  the 
observed  phase  is  retained.  Masked  thresholds  are  calculated  at 
each  frequency  bin  for  a  given  short-time  block  via  the  masking 
model  proposed  in  [7],  which  takes  into  account  both  simultane¬ 
ous  masking  and  absolute  hearing  thresholds,  and  has  been  used  in 
other  recent  perceptually  motivated  noise  reduction  systems  [2,3], 
This  paper  uses  the  perceptually  based  loss  function  of  (3)  in 
a  Support  Vector  Machine  (SVM)  framework,  as  described  in  Sec¬ 
tion  2.  A  variant  of  the  Sequential  Minimal  Optimisation  (SMO) 
algorithm  for  the  complex  problem  is  presented  in  Section  3.  Ex¬ 
perimental  results  are  presented  in  Section  4. 

2.  SVM  FRAMEWORK 

As  demonstrated  in  (2),  the  ubiquitous  AR  model  of  audio  is  in¬ 
compatible  with  the  assumption  of  independence  of  frequency  com¬ 
ponents  in  the  DFT.  Intuitively,  and  from  informal  observations,  it 
appears  reasonable  to  favour  the  idea  that  there  is  some  correla¬ 
tion.  This  being  the  case,  it  seems  logical  to  consider  some  form  of 
kernel-based  regression  in  order  to  capture  this  correlation  while 
at  the  same  time  allowing  the  freedom  of  nonlinear  estimation. 
The  most  readily  identifiable  problem  with  this  approach  is  the  in¬ 
troduction  of  the  underlying  assumption  that  the  audio  statistics 
remain  constant  over  time,  which,  in  itself,  is  not  necessarily  accu¬ 
rate.  This  reservation  is,  of  course,  a  standard  one  for  such  audio 
filtering  and  should  be  borne  in  mind  when  considering  the  final 
results. 

2.1.  Kernel  based  Regression 

Consider  the  problem  of  estimating  a  latent  function  relating  some 
input,  x,  with  a  corresponding  output,  y, 

V=  /(x), 

using  some  training  data.  D  —  {(x,,t/;  )}ili.  Assume  that  the 
output  data  is  drawn  from  a  probability  density  function 

*/!/(  ),  x  ~Fv-|f,x  (?/!/(•)>  x) . 

Using  this  a  posterior  probability  density  function  for  /(■)  can  be 
found, 

PF|Y-,X  (/(•)!?/,  x)  ocpV|F,x  (?y|/(-).x)pF  (/(■))  .  (4) 
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where  pf  (/(•))  is  the  prior  probability  density  function.  It  is  the 
aim  of  many  authors  [8-10]  to  relate  this  formulation  to  ideas  of 
regularization,  also  known  as  stabilization,  or  prior  smoothing.  In 
the  case  at  hand,  pf  (/(•))  would  be  related  to  the  result  in  (2). 

Taking  negative  logs  of  (4)  yields 

H[f(-)]  =  V{K,y,f(-)]+n[f(-)],  (5) 


2.2.  Complex- Valued  SVM 

Returning  to  the  original  problem,  with  the  consideration  of  corre¬ 
lation  between  the  frequency  components,  a  more  applicable  start¬ 
ing  point  (in  that  it  includes  phase  considerations)  is 

Minimise  p\  +  A||w||2 

Subject  to  |/(Xfc)  -  Yk j  <  ek  +  pk. 


where  V[x,  y;  /(•))]  can  be  interpreted  as  some  error  cost  func¬ 
tion  or  loss  function,  used  to  measure  the  interpolation  error  and 
as  a  smoothness  functional,  stabilizer,  or  regularization 
term.  A  standard  situation  is  one  which  sees 

Vi  =  /(xj)-t-Ui, 
where  vt  ~  A''(0,  a2).  In  this  case. 


This  constraint,  although  appearing  to  be  linear,  is  in  fact  quadratic, 
making  the  problem  much  harder  to  solve.  Linear  constraints  are 
preferable  since  the  problem  then  reduces  to  that  of  quadratic  pro¬ 
gramming;  separating  the  above  constraints  such  that  there  is  in¬ 
dividual  consideration  of  the  real  and  imaginary  components  can 
reduce  the  problem  to  the  more  desirable  linear  form. 

Before  doing  this,  consider  the  representation  in  the  complex 
feature  space  where 


N 

V[X,  V\  /(')]  «  ~  /(Xi))2  • 

i= 1 

In  any  case  for  which  this  has,  in  combination  with  Q  [/(•)],  a  sin¬ 
gle  minimum,  finding  /(•)  at  this  minimum  is  equivalent  to  finding 
the  maximum  a-posteriori  (MAP)  solution  for  /(•). 

The  regularization  term  in  kernel-based  regression  is  propor¬ 
tional  to  the  square  of  the  norm  of  the  projection  of  the  function 
into  some  Reproducing  Kernel  Hilbert  space  (RKHS)  H  k  ,  a  sub¬ 
space  of  some  Hilbert  Space,  %,  in  which  /(•)  is  taken  to  exist, 

fi[/(-)]=A||P*/(.)|&K. 

It  is  beyond  the  scope  of  this  summary  to  go  into  detail;  related 
references  are  [9, 10],  among  others. 

Traditional,  real-valued  support  vector  regression  is  a  subset 
of  kernel-based  regression  in  which  the  Hilbert  space  containing 
the  function  is  constrained  to  the  span  of  some  kernel,  K (-,  •),  and 
a  constant.  This  has  the  result  that  the  functional  approximation 
takes  the  form 


<w,$(Xfc)>^  =  wH$(X*) 

=  w?$*(X*)+w  ?*/(Xfc) 

+  j(w£$,(Xfc)-wf$*(Xfc)), 


with  subscripts  R  and  I  denoting  real  and  imaginary  components 
respectively.  In  this  paper,  looking  to  the  simpler  case  of  linear 
loss  outside  the  threshold,  the  problem  in  equation  (7)  can  be  then 
rewritten  as 


Minimise 


Subject  to 


x  Ei dm  +  m  +vl-  +  Vk)  +  f  IMI2 
’  Yr< k  -  vrR$R(Xk)  -  w J  $/(Xfc)  -  bR 

<  e R,k  +  r]k 

w£$.r(X*)  +  wf$/(Xfc)  +bR  -  YR,k 

<  CR,k  +  rjk 

■  17,*  -  w£<f>/(X*)  +  wf$fi(X*)  -  bj 

<  eI,k  +  P k 

w5#/(X*)-wJ’S*(X*)+6j-171* 

<  ei,k  +  rji 

Pk,Pk,pt,pt  >  0. 


This  can  be  used  to  form  the  Lagrangian, 


N 

/(•)  =  ^2/3iK{-,Xi)  +b.  (6) 

i=l 

Appropriate  kernels  which  satisfy  Mercer’s  conditions  (see,  forex- 
ample,  Smola  and  Scholkopf  [9]),  can  also  be  considered  to  be  the 
inner  product  of  some  mapping  to  a  feature  space  T,  4>(x)  that 
is,  fT(xj,Xj)  =  {<l>(x,),  <f>(x7))jc-.  From  this  perspective,  there 
exists  some  vector,  w,  in  the  feature  space  such  that 

N 

=  (w,$(-))jr, 

i= 1 

and  it  can  be  shown  that, 

«[/(*)]  -  A||w||3r. 


L  -  jY^iPk  +  pk  +  pi  +  pt)  +  ^  wHw 

—  ^2  [Q*(w«^’t?(Xfc)  +  wf$/(Xfc)  +  bR  —  YRik 

fc 

+eR,k  +  Pk)  +  &k{YRtk  —  ’WR$R(Xk)  —  wj  $/(Xfc) 
—bR  +  eRp  +  fjk)  +  Q*(wS#/(X*)  —  wf  $j}(Xfc) 
+bj  —  17,*  +  6/,*  +  pi)  +  a*(17,*  —  wR$/(Xk) 
+wf  <f>R(Xk)  —bi+  eij *  + 

-  ^2  VkPk  +  fkPk  +  rtpl  +  rtfjl)] , 

k 

where  {ak,  a*,  a).,  a,'}  and  {rk ,  rk ,  r* ,  r*k }  are Kuhn-Tucker mul¬ 
tipliers.  Applying  an  extension  of  the  standard  SVM  steps  leads  to 
a  dual  expression  to  maximise. 


Typically,  when  using  an  SVM  approach  for  regression,  a  threshold- 
based  loss  function  is  employed.  This  has  the  advantage  that  the 
vector  (3  may  well  be  sparse,  thereby  reducing  computational  re¬ 
quirements. 


k 

-  [ej?.,*|/?fl,fc|  +  e/,*|/3/,*|])  -  3HK/3 , 
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Real  results 


Imaginary  results 


where  ft.  =  («*•  -  &k)  -  j(«*  -  ft),  ft  =  0  and  K  is  the 
matrix  with  (i,  j)th  entry  A'(x,,  Xj).  With  this  formulation. 

/(•)  =Y,faK{;Xk)+b, 

k 

where  ft  is  the  complex  conjugate  of  fit . 

3.  COMPLEX  SMO 

The  Sequential  Minimal  Optimisation  (SMO)  algorithm  was  de¬ 
veloped  for  SVM  classification  by  Platt  [11],  and  presented  for  re¬ 
gression  by  Flake  and  Lawrence  [  12].  The  central  idea  of  the  SMO 
algorithm  is  that,  when  the  Lagrangian  is  maximised  with  respect 
to  two  points  (genetically  labelled  f) i  and  ft),  the  maximisation 
becomes  analytically  tractable. 

The  complex-valued  problem  can  be  similarly  structured  and 
effectively  decomposes  into  real  and  imaginary  cases;  an  overview 
of  it  will  be  given  here.  As  for  the  studied  cases  the  two  points  are 
constrained  to  add  to  a  constant.  First  the  new  value  ft  is  found 
and,  if  outside  the  possible  region,  it  is  clipped  to  the  nearest  ex¬ 
tremum.  With  this  result  the  corresponding  value  for  ft  is  found 
by  subtracting  the  new  ft  from  the  old  sum. 

The  unclipped  updates  for  the  real  and  imaginary  points  are, 

one  w  nold 

PR,  2  —  PR,  2 

Eg, l  -  En, 2  -  (e/?.2Sgn(ft?,2)  -  ffl,isgn(/ft.i ))  (g. 

All  +  E.22  —  A  12  —  A21 

anew  oold 

Pi, 2  —  Pi, 2 

Ei, 2  -  Ej_j  -  (e/,2Sgn(ft,2)  -  e/,iSgn(ft,i)) 

A'n  +  A22  —  A12  —  A21 

where  Ej  =  /(Xj)  -  Yj.  Note  that  both  terms  are  real  and,  im¬ 
portantly,  that  A' 12  =  K-z 1 . 

4.  RESULTS  AND  DISCUSSION 

Preliminary  trials  have  been  conducted  using  both  linear  and  Gaus¬ 
sian  kernels,  with  promising  results.  In  these  experiments  a  30- 
second  male  voice  recording  was  used  for  training  as  well  as  es¬ 
timation,  keeping  the  sound  source  consistent.  Of  this  approxi¬ 
mately  six  seconds  were  used  as  training  data,  in  the  form  of  200 
DFTs  of  time  interval  length  512,  with  an  overlap  in  time  of  50%. 
The  speech  signal  was  degraded  artificially  with  additive  white 
Gaussian  noise  to  yield  a  signal-to-noise  ratio  of  15  dB;  audio  ex¬ 
amples  typical  of  results  obtained  are  available  at 
http : //www-sigproc . eng . cam. ac . uk/ ~sih22. 

Training  was  conducted  for  each  frequency  bin;  in  every  case 
the  129  nearest  noisy  frequency  bins  were  used  as  input  to  the 
filter.  For  each  instance  a  corresponding  set  of  200  ftvalues  were 
found.  The  final  filters  were  then  implemented  on  all  data  and  the 
original  time  signal  reconstructed. 

Initial  results  indicate  that,  graphically  and  on  a  local  basis, 
the  algorithm  is  performing  as  desired.  This  is  illustrated  by  a  rep¬ 
resentative  example  in  Figure  1.  Here  it  is  clear  that,  in  the  main, 
the  filter  output  lies  with  in  the  threshold  regions,  as  intended.  Fre¬ 
quency  bins  have  been  chosen  to  illustrate  different  phenomena  of 
the  same  DFT,  and  as  such,  note  should  be  taken  of  the  amplitude 
scales. 


205  210  215  220 

Frequency  bins 


Fig.  1.  Successful  filtering  in  the  frequency  domain.  The  origi¬ 
nal  signal  is  shown  as  a  continuous  line,  the  thresholds  as  dashed 
lines,  the  filter  output  as  a  dotted  continuous  line  and  the  received 
noisy  signal  as  points.  Note  that  the  filtered  output  closely  follows 
the  original  signal. 


While  visibly  good,  audibly  the  results  achieved  are  not  yet 
superior  to  current  perceptually  motivated  techniques,  e.g.,  [1-3]. 
On  closer  inspection  some  reasons  for  this  become  apparent.  Two 
readily  observed  problem  cases  are  shown  in  Figure  2.  The  first 
of  these  demonstrates  an  extreme  case  of  amplitude  underestima¬ 
tion.  It  has  been  observed  that  the  frequency  estimations  tend  to 
be  lower  than  the  original  signal  more  often  than  higher;  this  is 
even  the  case  when  the  estimation  is  within  the  thresholds.  It  may 
be  possible  to  overcome  this  problem  through  a  more  intelligent 
choice  of  kernel,  see,  for  example.  [13].  Here  the  linear  kernel  has 
been  used  for  initial  investigations,  which  is  perhaps  more  appro¬ 
priate  in  the  estimation  of  smooth  curves,  as  indeed  underestima¬ 
tion  in  this  case  appears  due  to  smoothing. 

The  lower  plots  in  Figure  2  illustrate  what  may  be  a  slight  fail¬ 
ing  of  the  a-priori  assumptions.  This  is  namely  that  regions  with 
broad  thresholds  allow  significant  deviation  from  the  original  sig¬ 
nal.  While  it  has  been  claimed  that  the  thresholds  are  derived  such 
that  this  variation  is  not  audible,  this  is  for  individual  variation  with 
respect  to  the  global  status  quo.  In  the  case  that  a  large  number  of 
variations  are  occurring  and,  in  addition,  the  smoothing  discussed 
previously  is  flattening  the  overall  DFT  (albeit  all  within  thresh¬ 
olds).  then  clearly  a  significant  global  deviation  from  the  original 
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Fig.  2.  Problematic  filtering  in  the  frequency  domain.  The  origi¬ 
nal  signal  is  shown  as  a  continuous  line,  the  thresholds  as  dashed 
lines,  the  filter  output  as  a  dotted  continuous  line  and  the  received 
noisy  signal  as  points. 


is  taking  place.  Obviously  there  is  no  simple  solution  to  this,  it  be¬ 
ing  an  essential  part  of  the  overall  framework.  One  approach  may 
be  to  identify  regions  of  excessively  large  margin,  such  as  that  il¬ 
lustrated,  and  to  reduce  them.  Ideas  similar  to  this  appear  in  [1], 

In  addition  to  the  extensions  mentioned,  to  date  no  rigorous 
parameter  determinations  have  been  made.  Note  that  this  is  a  prob¬ 
lem  faced  in  all  SVM  applications;  methods  for  such  determina¬ 
tions  appear  in  [14],  among  others.  As  well,  a  linear  loss  function 
has  been  used,  which  is  not  at  all  perceptually  derived.  As  the  loss 
function  determines  the  tradeoff  in  the  optimisation  process,  it  ap¬ 
pears  sensible  to  attempt  to  determine  in,  some  manner,  one  which 
better  reflects  the  sensitivities  of  the  ear. 

5.  CONCLUSION 

It  has  been  seen  that  the  perceptually  based  loss  functions  of  [1]  fit 
well  in  a  complex- valued  SVM  framework.  In  addition  this  frame¬ 
work  allows  a  considerable  extension  of  the  audio  model,  incor¬ 
porating  more  realistic  prior  belief  about  the  correlation  between 
frequency  components.  This  includes  the  previously  unconsidered 
(in  the  context  of  perceptually  based  filtering)  aspect  of  prior  be¬ 
lief  about  phase.  In  this  sense  the  algorithm  takes  a  more  holistic 
approach  to  estimating  the  spectrum  of  the  audio  signal. 

In  the  course  of  applying  SVMs  to  the  problem  of  perceptual 
audio  filtering,  new  results  have  been  presented  on  the  application 
of  SVMs  to  the  estimation  of  a  complex-valued  function.  These 
include  the  derivation  of  the  Lagrangian  formulation  and  of  the 
complex  SMO  algorithm.  These  are  results  which  should  prove 
more  widely  applicable  for  such  problems. 

While  a  large  proportion  of  Section  4  dwelt  on  potential  im¬ 
provements  regarding  the  audio  results,  it  is  important  to  empha¬ 
sise  that  the  results  obtained  did  improve  the  quality  of  the  signal, 
albeit  not  to  state-of-the-art  levels.  This  is  no  surprise  given  that 
present  results  are  the  result  of  initial  investigations,  and  several 
choices  (e.g.,  parameters,  kernel,  and  loss  function)  have  yet  to  be 


optimised.  However,  the  preliminary  results  presented  herein  in¬ 
dicate  that,  with  the  refinements  discussed,  further  improvements 
will  likely  be  possible. 
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ABSTRACT 

Real  room  acoustic  impulse  responses  (AIRs)  modelled  by  infinite 
impulse  response  (IIR)  filters  require  high  model  orders.  Many 
problems  involving  the  estimation  of  AIRs  reduce  to  high  dimen¬ 
sional  optimisation  problems.  Subband  autoregressive  (AR)  mod¬ 
elling  techniques  reduce  this  difficult  optimisation  problem  to  a 
number  of  simpler  low  dimensional  optimisations.  This  paper  in¬ 
troduces  a  formulation  for  subband  AR  modelling  in  a  probablistic 
framework  which  facilitates  robust  Bayesian  parameter  estimation. 
The  paper  also  provides  new  results  to  show  that  the  subband  AR 
representation  accurately  models  typical  AIRs  and,  therefore,  is 
suitable  for  modelling  room  reverberation. 

1.  INTRODUCTION 

The  transfer  function  due  to  the  acoustics  of  a  room  generally  do 
not  change  considerably  with  time,  but  do  vary  with  the  spatial 
locations  of  the  sound  source  and  observer.  Assuming  both  are 
spatially  stationary,  a  linear  time-invariant  (LTI)  model  is  appro¬ 
priate.  The  all-pole  model  can  parsimoniously  approximate  ratio¬ 
nal  transfer  functions,  and  typical  all-pole  model  orders  required 
for  approximating  room  transfer  functions  (RTFs)  are  in  the  range 
50  <  P  <  500  -  around  a  factor  of  40  lower  than  all-zero  model 
orders  [1],  A  room  acoustic  impulse  response  (AIR),  h(t),  may  be 
modelled  by  a  LTI  all-pole  filter  of  order  P,  as  given  by: 

h(t)  =  —  Y,  a(p)  h(t  -p)  +  5(f),  t  £  Z  (1) 
pev 

where  a  =  {a(p),  p  6  V  =  {1,  •  • . ,  P}}  are  the  model  parame¬ 
ters,  P  is  the  number  of  poles,  and  <5(t)  is  the  Kronecker  delta. 

In  many  applications,  such  as  single  channel  blind  dereverber¬ 
ation  [2],  an  estimate  of  the  AIR  is  required  and,  in  general,  this  re¬ 
duces  to  a  high-dimensional  optimisation  problem.  This  is  difficult 
to  solve  because  attempts  to  model  the  entire  acoustic  spectrum  by 
a  single  IIR  filter  leads  to  a  large  computational  load,  as  well  as 
numerical  problems  resulting  from  the  size  of  the  parameter  space. 
The  problem  is  that  the  all-pole  model  must  simultaneously  fit  the 
entire  frequency  range,  even  though  the  model  may  fit  some  re¬ 
gions  in  this  frequency  space  better  than  others.  Thus,  it  is  better 
to  model  a  particular  frequency  band  of  the  filter’s  spectrum  by  an 
all-pole  model,  resulting  in  a  lower  model  order  for  that  frequency 
band  and,  therefore,  improved  parameter  estimation.  Effectively, 
the  modelling  of  different  frequency  bands  has  been  decoupled , 
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leading  to  a  better  model  fit  and  also  reducing  a  high-dimensional 
optimisation  problems  to  a  number  of  low-dimensional  ones. 

Subband  methods  have  previously  been  used  to  model  acous¬ 
tic  environments  with  much  success  [3-6].  Subband  linear  pre¬ 
diction  has  been  considered  in  [7-9],  This  paper  introduces  a 
probabilistic  formulation  for  subband  AR  modelling  which  leads 
to  Bayesian  parameter  estimation.  The  paper  also  demonstrates 
that  subband  AR  models  are  suitable  for  modelling  room  acous¬ 
tics  and,  therefore,  are  suitable  for  modelling  room  reverberation. 

2.  FREQUENCY  DOMAIN  FORMULATION 

If  the  room  is  excited  by  white  Gaussian  noise  (WGN),  the  param¬ 
eter  vector  of  (1),  a,  can  be  estimated  by  considering  (1)  as  an  AR 
process.  In  the  time-domain  formulation  of  the  method  of  least- 
squares,  it  is  sought  to  find  a  which  minimises  the  expected  value 
of  the  square  of  the  excitation  sequence  for  the  AR  sequence: 

s(t)  =  -  ^2  a(p)s(t  ~P)  +  e(t),  Vf  6  Z  (2) 

per 

where  e(t)  ~  M  (e(t)  |  0,  cr2)  and  s(t)  are  the  input  excitation 
and  output,  respectively.  However,  estimators  for  AR  models  can 
also  be  formulated  in  the  frequency  domain  [10], 

2.1.  Likelihood  Function 

The  data  sequence  (s(f),  t  e  T  =  {0, ... ,T  -  1}}  denotes  a 
segment  of  the  infinite  sequence  introduced  in  (2)  and,  for  sim¬ 
plicity,  is  assumed  to  be  periodic;  as  T  —>  oo,  this  approximation 
becomes  more  accurate.  Application  of  the  DFT  to  (2),  gives: 

S{k)  =  S(k)  +  Y^a(p)  S(k)  (3) 

p£V  *-  > 

Denoting  £  =  {£(k),  k  G  K.  =  T},  (3)  may  be  written  as: 

£  =  S  +  Sa  (4) 

where  S  =  [Si  ...  SP),  and  [Sp\k  =  exp{-^}S(k),  k  6 
K.  Define  [WT]fc+i,t+i  =  exp{-2^*},  Vfc  e  JC,  Vf  €  T  => 
£=WTe.  Noting  e  is  WGN  and  WT  W*.  =  IT  =►  |WT|  = 
1,  where  It  6  RTxT  is  the  identity  matrix,  then  using  the  proba¬ 
bility  transformation: 

pe<£)  =  iw^W-{Wf,£) 
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it  follows:  ps  (£)  =  Af  {£  \  0,  <r2  IT)  ( 

Since  the  Jacobian  J(S,  £)  is  unity,  the  likelihood  function  is: 


ps  (5  |  a,  cr2)  = - 5 r  exp  ■ 

(2  ir«r»)T 


ll«g  +  Sa|| 
2<t2 


The  maximum-likelihood  estimate  (MLE)  is,  therefore, 

a  =  -(StS)-1St5  (6) 

Hence,  with  suitable  priors  for  the  unknown  parameters,  subband 
AR  modelling  can  be  formulated  in  the  Bayesian  framework,  with 
the  likelihood  function  given  above. 

3.  SELECTIVE  SUBBAND  MODELLING 

Consider  modelling  the  power  spectrum,  V(e’u)  =  |<S(e3“)|2, 
of  s(t),  where  s(t)  ?=±  S(e,u)  is  a  Fourier  transform  pair,  in  the 
region  Clk  =  (w/t,  wjt+i).  Consider  a  signal  5(f)  whose  power 
spectrum,  V(e}u'),  is  related  to  V(e3U)  by  the  mapping 


P(e3U')  =  P(e3U),u>  =  { 


^fc-f  1  ~  Wk  ' 
7T 


+  Wfc,  J  €  (0,  tt) 


It  is  seen  that  the  region  u  e  Qk  is  mapped  onto  J  €  (0, 7r),  and 
the  new  process,  s(f),  can  be  modelled  as  an  all-pole  filter  across 
the  entire  spectrum,  with  approximate  power  spectrum: 


P(e>w)  = 


1  +  E  ak{p)e-wu | 

pgp 


w  6(0,  it)  (7) 


Hence,  the  estimated  power  spectrum  for  V(e3W),  over  the  com¬ 
plete  frequency  range,  (0,  i r),  can  be  represented  by  a  series  of 
subband  models,  as  given  by: 


\S(en\2  =  £ 


Gj  I(wfc,m+1)  (w) 


fc=0  1+  E  ofc(p)e 

I  pen  I 

where  Vk  =  {1,  •  •  • ,  Pk},  the  spectrum  of  the  excitation  sequence 
is  given  by  e(f)  ^  £(&*“),  In  (w)  =  1  if  u  €  fi  and  zero  oth¬ 
erwise,  too  =  0,  ojk+i  =  7r,  and  K  is  the  number  of  subbands. 
The  excitation  variance  must  be  scaled  proportionally,  nGk  = 
Gk(u)k+ 1  -  wfc),  since  energy  must  be  conserved  in  the  trans¬ 
formation.  Although,  in  the  frequency  range  Clk,  (8)  models  the 
power  spectrum  of  the  process  s(t)  and,  therefore,  the  magnitude 
of  the  spectrum  <S(eJ),  it  does  not  accurately  model  the  phase 
of  s(t),  i.e.  &rgS(eJU)  since  phase  information  is  lost.  Hence,  (8) 
suggests  that  (s(f)}  is  related  to  its  excitation  sequence  {e(t)}  by: 


s[e3uj)  =  — k  £(e3U^<-uk'uk+i)  ^ 

k=0  1+  E  ak(p)e 

PZ’Pk 


The  phase  response  of  a  true  AR  process  is  always  minimum  phase. 
Consider  the  typical  phase  response  of  an  AR  process  shown  in 
Figure  1(a).  The  phase  of  the  subband  fii  n  Hi  =  {uii  <  ui2}  n 
{27T-W2, 27r-wi}  is  shown  in  Figure  1(b),  where  arg5(eJ"1)  / 
0  or  7r.  The  model  in  (9)  cannot  model  this  phase  response;  the  best 
it  can  do  is  shown  in  Figure  1(b).  A  more  accurate  model  is: 


Typical  Phase  Response 
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(a)  Typical  Phase  Response 


Subband  Phase  Response 


0  12  3  4  5  e 

Angular  Frequency  ,  rad  /  tec 


(b)  Subband  Phase  Response 


Fig.  1.  The  Phase  Ambiguity. 


S(e3U)  =  £ 


*=0  1  +  E  ak{p)e 

P€Vk 


oj  —  uj  i. 

>  3PW  “k  +  l-“k 


where  eJ'sik(w)  corresponds  to  an  additional  phase  term  to  com¬ 
pensate  for  the  difference  between  the  actual  phase  response,  and 
the  phase  response  of  an  AR  process  with  identical  magnitude  re¬ 
sponse.  Estimation  of  this  phase  term  is  considered  in  §5.2. 

Given  the  model  in  (10),  the  analysis  in  §2.1  can  be  applied 
to  each  subband,  k  €  {0, . . . ,  K  -  1},  to  obtain  estimates  of  the 
parameters  a*,,  provided  it  is  reformulated  so  that  the  optimisation 
is  over  the  frequency  range  u  e  Qk:  i.e.  apply  (4)  to  the  spectral 
error  sequence  £(e>Um),  u>m  €  Qk,  where  wm  =  and  T  is 
the  number  of  error  samples  corresponding  to  the  sequence  s{t), 
and  use  (6)  to  obtain  an  estimate  of  ak.  The  temporal  subband  AR 
modelling  method  implicitly  uses  a  filter  bank  network  and,  there¬ 
fore,  care  must  be  taken  to  ensure  that  the  filter  bank  possesses 
perfect  reconstruction  properties.  Details  of  such  techniques  are 
discussed  in  [12]  but,  for  brevity,  are  not  taken  into  account  here. 
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4.  SUBBAND  MODELLING  EXAMPLE 


Acoustic  Impulse  Response  (AIR) 


As  an  example  of  subband  AR  modelling,  a  true  8th-order  AR 
spectrum  is  modelled  using  three  subbands;  since  the  number  of 
subbands  and  the  model  order  in  each  band  are  fixed,  the  location 
of  the  spectral  changepoint  is  determined  using  Bayesian  change- 
point  estimation:  see  [1 1],  Figure  2  shows  the  estimated  spectra  in 
each  subband  for  a  particular  choice  of  model  order. 


Fig.  2.  Subband  modelling  an  AR(8)  process.  The  original  and 
estimated  spectra  in  each  of  the  three  subbands  are  shown,  and  the 
vertical  line  denotes  the  boundary  of  the  subbands. 


S.  SUBBAND  MODELLING  OF  ROOM  ACOUSTICS 

The  subband  model  in  (10)  is  used  to  represent  a  known  ADR  and, 
thus,  could  be  inverted  directly.  Naturally,  in  practice  the  AIR  will 
be  unknown  and  this  will  not  be  possible.  This  section  investigates 
the  ability  of  the  subband  model  to  equalise  the  RTF. 

5.1.  Reconstructing  the  Magnitude  Frequency  Response 

A  typical  AIR,  measured  in  a  stairwell,  with  no  direct  path  from 
source  to  observer,  is  shown  in  Figure  3(a),  with  magnitude  fre¬ 
quency  response  shown  in  Figure  3(b).  The  length  of  this  impulse 
response  is  T  =  4000  samples.  The  minimum-phase  equiva¬ 
lent  of  this  impulse  response  [13]  is  modelled  using  (10).  Then,  a 
MLE  is  calculated  using  (6),  where  the  phase  response  is  modified 
as  discussed  in  §5.2;  P*  =  50,  Vfc  €  (0, . . . ,  K  —  1},  K  =  500, 
and  the  AIR  is  zero-padded  by  a  factor  of  200  to  improve  numer¬ 
ical  stability  [11].  The  equalised  impulse  response  is  calculated 
by  inverting  the  frequency  response  of  the  model  in  each  subband, 
multiplying  by  the  original  frequency  response,  and  taking  the  in¬ 
verse  Fourier  transform.  Figure  4  shows  the  equalised  impulse 
response,  and  the  magnitude  response  of  the  equalised  RTF  shown 
in  Figure  5  indicates  that  the  spectral  coloration  is  significantly 
reduced. 

However,  as  demonstrated  in  Figure  6,  a  closer  inspection  re¬ 
veals  why  the  magnitude  response  contains  many  sharp  spectral 
components:  since  the  model  in  each  subband  is  completely  de¬ 
coupled  from  the  other  subbands,  there  are  discontinuities  in  the 
spectrum  at  the  subband  boundaries.  The  model  in  (10)  does  not 
enforce  any  continuity  between  blocks,  although  this  can  be  en¬ 
sured  by  modifying  the  prior  distributions  for  the  AR  parameters 
such  that  the  end  point  at  the  lower  subband  boundary  is  con¬ 
strained  to  match  the  estimated  spectrum  in  the  previous  subband. 


Magnitude  Response  of  AIR 
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(b)  Magnitude  Frequency  Response 


Fig.  3.  Typical  acoustic  impulse  response. 

Moreover,  the  resonant  spikes  at  the  subband  boundaries  in  Fig¬ 
ure  6  are  due  to  the  implicit  use  of  a  filter  bank  that  does  not 
possess  perfect  reconstruction  properties  [12].  The  modelling  of 
the  magnitude  frequency  response  of  the  nonminimum-phase  AIR 
gives  similar  results  to  the  minimum-phase  system,  since  the  mag¬ 
nitude  responses  are  identical.  For  nonminimum-phase  systems, 
the  phase  response  must  be  modelled  as  discussed  below. 

5.2.  Reconstructing  the  Phase  Frequency  Response 

In  modelling  the  AIR  shown  in  Figure  3(a),  it  is  sought  to  minimise 
the  phase  discrepancy,  4>k{v),  introduced  when  using  a  parameter 
estimate  based  on  the  spectral  error  function.  Estimating  <pk(u>) 
is  difficult,  and  is  modelled  using  a  polynomial  such  that  a  least- 
squares  fit  may  be  obtained.  For  room  acoustics,  observational 
experiments  suggest  a  good  approximation  for  4>k{w)  is: 

^Hw^o+^iw  +  ^w2  (11) 

The  coefficients  {ipi}  can  be  estimated  using  least-squares.  The 
equalised  impulse  response  when  <j>k( u>)  is  not  accounted  for  is 
shown  in  Figure  7.  Compared  with  Figure  4,  where  </>k(u)  has 
been  accounted  for,  the  equalised  response  is  much  longer,  and 
thus  no  longer  accurately  reflects  an  impulse.  Acoustic  listening 
tests,  in  which  a  clean  speech  signal  is  filtered  by  each  of  the 
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Equalised  Impulse  Response 


Fig.  4.  Equalised  impulse  response. 


Equalised  Magnitude  Response 
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Equalised  Magnitude  Response 


Fig.  6.  Discontinuities  between  the  subbands, 


Equalised  Impulse  Response 


Fig.  5.  Equalised  magnitude  response. 


equalised  responses  in  Figures  4  and  7,  indicate  that  the  speech 
is  heavily  distorted  in  the  latter  case  when  <j>k  (ui)  is  not  modelled. 

6.  CONCLUSIONS 

In  this  paper,  a  subband  AR  model  has  been  shown  to  represent 
a  typical  AIR  reasonably  accurately.  The  likelihood-function  for 
this  spectral  model  is  identical  in  form  that  obtained  for  a  time 
series  and,  therefore,  the  subband  model  elegantly  fits  into  the 
Bayesian  framework  [2, 11].  The  model  produced  in  this  paper 
means  that  a  difficult  high-dimensional  optimisation  problem  re¬ 
duces  to  a  number  of  simpler  low-dimensional  optimisation  prob¬ 
lems.  For  nonminimum-phase  AIRs,  where  a  causal  inverse  does 
not  exist,  only  subbands  which  possess  minimum-phase  character¬ 
istics  can  be  inverted;  hence,  the  method  for  detecting  the  minimum- 
phase  subbands  in  [5]  should  be  applied  to  this  subband  model. 
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ABSTRACT 


1.2.  The  Ephraim  and  Malah  Suppression  Rule 


Short-time  spectral  attenuation  is  a  common  form  of  audio  signal 
enhancement  in  which  a  time-varying  filter,  or  suppression  rule,  is 
applied  to  the  frequency-domain  transform  of  a  corrupted  signal. 
The  Ephraim  and  Malah  suppression  rule  for  speech  enhancement 
is  both  optimal  in  the  minimum  mean-square  error  sense  and  well- 
known  for  its  associated  colourless  residual  noise;  however,  it  re¬ 
quires  the  computation  of  exponential  and  Bessel  functions.  In  this 
paper  we  show  that,  under  the  same  modelling  assumptions,  alter¬ 
native  Bayesian  approaches  lead  to  suppression  rules  exhibiting 
almost  identical  behaviour.  We  derive  three  such  rules  and  show 
that  they  are  efficient  to  implement  and  yield  a  more  intuitive  in¬ 
terpretation. 


1.  INTRODUCTION 
1.1.  Short-Time  Spectral  Attenuation 

Short-time  spectral  attenuation  is  a  popular  method  of  broadband 
noise  reduction  in  which  a  time-varying  filter  is  applied  to  the 
frequency-domain  transform  of  a  corrupted  audio  signal.  Often 
such  a  signal  is  modelled  as  follows;  let  {xn}  =  {x(nT)}  in 
general  represent  a  set  of  values  from  a  finite-duration  analogue 
signal  sampled  regularly  at  intervals  of  T,  so  that  at  time  n  one 
has  the  additive  observation  model  yn  =  xn  +  dn,  where  yn  is  the 
observed  signal,  xn  is  the  original  signal,  and  d„  is  random  noise. 

In  many  implementations  the  set  of  observations  {yn}  is  anal¬ 
ysed  using  the  discrete  Fourier  transform  (DFT),  via  the  overlap- 
add  method  of  short-time  Fourier  analysis  and  synthesis.  Noise 
reduction  in  this  manner  may  be  viewed  as  the  application  of  a 
suppression  rule,  or  nonnegative  real- valued  gain  Hk,  to  each  bin 
k  of  the  observed  signal  spectrum  Yfc,  in  order  to  form  an  estimate 
Xfc  of  the  original  signal  spectrum. 

In  the  ensuing  discussion  of  such  suppression  rules  we  con¬ 
sider,  for  simplicity  of  notation  and  without  loss  of  generality,  the 
case  of  a  single  (windowed)  short-time  block.  To  facilitate  a  com¬ 
parison  our  notation  follows  that  of  Ephraim  and  Malah  [1],  except 
that  complex  quantities  appear  in  bold  throughout. 


"■Material  by  the  first  author  is  based  upon  work  supported  under  a  U.S. 
National  Science  Foundation  Graduate  Fellowship.  The  authors  also  wish 
to  acknowledge  the  contribution  of  Shyue  Ping  Ong  to  this  paper. 


Ephraim  and  Malah  [1]  derive  a  minimum  mean-square  error 
(MMSE)  short-time  spectral  amplitude  estimator  for  speech  en¬ 
hancement  under  the  assumption  that  the  Fourier  expansion  coef¬ 
ficients  of  the  original  signal  xn  and  the  noise  dn  may  be  modelled 
as  independent,  zero-mean,  Gaussian  random  variables.  Thus  the 
observed  spectral  component  in  DFT  bin  k,  Yk  —  Rk  exp(jt?fc), 
is  equal  to  the  sum  of  the  spectral  components  of  the  signal,  X*.  = 
Ak  exp(jar).  and  the  noise.  Dt .  This  model  leads  to  the  following 
marginal,  joint,  and  conditional  distributions; 


P{ak) 


P(ak) 
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A ,(*) 
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if  ak  G  [0,  oo) , 
otherwise. 
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p{Y k  \cik ,  ak)  = 
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if  ak  £  [— 7r,  it) , 
otherwise. 
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fl-A  d(k) 


exp 


exp 
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ak 


A  x(k)J 
|Yfc-o»e^*| 
Ad(fc) 


(1) 


(2) 


(3) 


(4) 


where  it  is  understood  that  (3)  and  (4)  are  defined  over  the  range  of 
ak  in  (1 )  and  ak  in  (2);  A x(k)  =  E[|X*.|2]  and  A d{k)  =  E[|Djt|2] 
denote  the  respective  variances  of  the  fcth  short-time  spectral  com¬ 
ponent  of  the  signal  and  noise.  The  MMSE  spectral  amplitude  es¬ 
timator  derived  by  Ephraim  and  Malah,  when  combined  with  their 
derived  optimal  phase  estimator  (the  observed  phase  i)k  [1]),  takes 
the  form  of  a  suppression  rule: 

rn  4.  4-  T,  n*}  i  ' 

2 


Hk  = 


27* 


[(l  +  ^)/o(f)+^b(f)]exp(^), 


(5) 


where  J0(-)  and  ft(-)  denote  the  modified  Bessel  functions  of  or¬ 
der  zero  and  one,  respectively.  Additionally, 

1  A  1  ,  1 

A  (fc)  A  x{k)  A  d(k) 

and 


vk 


-7  it; 


Zk 


A„(fc) 
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Ik  = 


A  d{kf 


1+6'"’  "  A  d(kf 

where  and  -yk  are  interpreted  after  [2]  as  the  a  priori  and  a 
posteriori  SNR,  respectively. 
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2.  DERIVATION  OF  EFFICIENT  APPROXIMATIONS 


we  have 


2.1.  Joint  Maximum  A  Posteriori  Spectral  Amplitude  and 
Phase  Estimator 

Joint  estimation  of  the  real  and  imaginary  components  of  under 
either  the  maximum  a  posteriori  (MAP)  or  MMSE  criterion  leads 
to  the  Wiener  estimator  (due  to  symmetry  of  the  resultant  poste¬ 
rior  distribution,  which  is  Gaussian).  However,  one  may  reformu¬ 
late  the  problem  in  terms  of  spectral  amplitude  Ak  and  phase  a/b, 
and  then  obtain  a  joint  MAP  estimate  by  maximising  the  posterior 
distribution  p(ak ,  ak  \ Yk ) : 


p(ak,ak\Yk) 

oc  p(Yk\ak,ak)p(ak,ak) 


CLk 


■K2\x{k)\d(k) 


exp 


|Yt  -  aktiak  1 

\d{k) 


al 

A  x(k) 


Since  ln(  )  is  a  monotonically  increasing  function,  one  may  equiv¬ 
alently  maximise  the  natural  logarithm  of  p(ak ,  ak  |Y*,).  Define 


T  _  |Yfc  -  akej 
J 1  — 


A  d(k) 

Differentiating  Ji  w.r.t.  ak  yields 
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CL u  . 

,  +  In  at  +  constant. 

A#  (ft) 


dak ' 


+(Y k  -  akejak)(jake-iak)] 

Setting  to  zero  and  substituting  Yk  =  Rk  exp (j-dk),  we  get 


H^k-ak) 


0  =  jakRke 
=  2jsm(dk  -  ak), 


-  jakRke 


&k) 


and  therefore 


ak  =  ' 


(6) 


i.e.,  the  joint  MAP  phase  estimate  is  simply  the  noise  phase.  Dif¬ 
ferentiating  J\  w.r.t.  ak  yields 


daJ1  A  Ak)  [(Yfc 


ake 


*)(~e3 


A  d(k) 

+(Y k-akeiak)(-e-jakj\  - 
Setting  the  above  to  zero  implies 


2a  fe  1 


A x(k)  ak' 


2 al  =  A x(k)  -  ^|dfc[2dfc  -  Rke-^k~&k)  -  Rke^k~&k)} 
—  Aaj(fc)  2 Rk  cos (■$&  Q!jfe)]- 

From  (6),  we  have  cos(?9fc  —  dk)  —  1;  therefore 

0  =  2(1  +  £k)al  —  2Rk£kak  -  A*(A;). 

Solving  the  above  quadratic  equation,  and  substituting 


A  ,(k)  =  —Rk2, 

Ik 


(7) 


ik  +  y^62  +  2(1  +  6)f5r 


Rk- 


2(1+6) 

Together  (8)  and  (6)  define  the  following  suppression  rule: 

&  +  \jik2  +  2(1  +  6)^ 


(B) 


Hk  = 


2(1+6) 


2.2.  Maximum  A  Posteriori  Spectral  Amplitude  Estimator 

First  we  note  that  the  posterior  density  p(ak  | Yk )  arising  from  inte¬ 
gration  over  the  phase  term  ak  is  Rician  with  parameters  ( al,s\ ) : 


al+$k  \  T  ( OksA 


pMyO  =  )  Jo(  ^  )  (9) 
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for  large  arguments  of  7o(  )  we  may  substitute  the  approximation 
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into  (9),  yielding 
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which  is  almost  Gaussian.  Considering  (11),  and  maximising  its 
natural  logarithm  w.r.t.  ak>  we  obtain 


J2  =  -i 


ak  —  Sk 
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dak 
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0  : 


Sk 


+  -  In  ak  +  constant 


ak  1 

i  * 


2ak 

-2  -  ak 

:  Q>k  Sk&k  2  ■ 


(12) 


Substituting  (10)  and  (7)  into  (12)  and  solving,  we  arrive  at  an 
estimator  differing  from  that  of  the  joint  MAP  solution  only  by 
a  factor  of  two  under  the  square  root  (owing  to  the  factor  -Jak 
in  (1 1);  replacement  with  ak  would  yield  the  joint  MAP  spectral 
amplitude  solution): 


Ak  = 


_  6 +  ^/fr2  + (!  +  £*)£ 


2(1  +  6) 


Rk- 


(13) 


Combining  (13)  with  the  Ephraim  and  Malah  optimal  phase  esti¬ 
mator  (i.e.,  the  observed  phase  ;  cf  (6)  also)  yields  the  following 
suppression  rule: 


Hk 


+  \J*ik2  +  (i  +  6)|f 


2(1+6) 
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Fig.  1.  Ephraim  and  Malah  MMSE  suppression  rule  Fig.  3.  MAP  approximation  suppression  rule  gain  difference 


Instantaneous  SNR  <dB)  30  “30  A  Prjori  SNR  !dB) 


Fig.  2.  Joint  MAP  suppression  rule  gain  difference 

2.3.  Minimum  Mean-Square  Error  Spectral  Power  Estimator 

Recall  that  Ephraim  and  Malah  formulate  the  first  moment  of  a 
Rician  posterior  distribution.  E[Ak  |Y*],  as  a  suppression  rule.  The 
second  moment  E[A\  |Yfc]  of  that  distribution  is  given  by  a  much 
simpler  formula  (see,  e.g.,  [3]): 

E[Al\Yk]  =  2al  +  si  (14) 

where  a\  and  s2k  are  as  defined  previously  in  (10).  Letting  Bk  — 
A\  and  substituting  for  ak  and  s\  in  (14)  yields 


where  Bk  is  the  optimal  spectral  power  estimator  in  the  MMSE 
sense,  as  it  is  also  the  first  moment  of  a  new  posterior  distribu¬ 
tion  p(6fc|Yfc)  having  a  noncentral  chi-square  probability  density 
function  with  two  degrees  of  freedom  and  parameters  {al,  sk). 


Instantaneous  SNR  (dB)  ~30  -*>  A  Prfori  SNR  (dB) 


Fig.  4.  MMSE  power  suppression  rule  gain  difference 


When  combined  with  the  observed  phase  {//,■,  this  estimator 
also  takes  the  form  of  a  suppression  rule: 


3.  COMPARISON  OF  APPROXIMATIONS 

Figure  1  shows  the  Ephraim  and  Malah  suppression  rule  as  a  func¬ 
tion  of  instantaneous  SNR  (defined  in  [1]  as  yk  -  1)  and  a  priori 
SNR  £*■ .  Figures  2,  3,  and  4  show  the  gain  difference  (in  dB)  be¬ 
tween  it  and  each  of  the  three  derived  suppression  rules  (note  the 
difference  in  scale).  Table  1  on  the  following  page  shows  a  com¬ 
parison  of  the  magnitude  of  gain  differences  for  the  three  approxi¬ 
mations.  The  MMSE  spectral  power  suppression  rule  provides  the 
best  and  most  consistent  approximation  to  the  Ephraim  and  Malah 
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Suppression  Rule 

(jk  —  1,6)  C  [—30,30]  dB 

(7*  - 1,6)  e  [-100, 100]  dB 

Mean 

Maximum 

Range 

Mean 

Maximum 

Range 

MMSE  Spectral  Power 

0.68473 

-1.0491 

1.0469 

0.63092 

-1.0491 

1.0491 

Joint  MAP  Spectral  Amplitude  and  Phase 

0.52192 

+  1.7713 

2.3352 

0.74507 

+  1.9611 

2.5250 

MAP  Spectral  Amplitude  Approximation 

1.2612 

+4.7012 

4.7012 

1.7423 

+4.9714 

4.9714 

Table  1.  Magnitude  of  deviation  from  Ephraim  and  Malah  MMSE  suppression  rule  gain 


rule,  with  only  slightly  less  suppression.  The  MAP  spectral  am¬ 
plitude  approximation,  although  still  within  5  dB  of  the  optimal 
value  over  a  wide  range  of  SNR,  is  the  poorest.  While  the  sign  of 
the  deviation  of  each  of  these  two  approximations  is  constant,  that 
of  the  joint  MAP  suppression  rule  depends  on  the  instantaneous 
and  a  priori  SNR. 


4.  DISCUSSION 


Ephraim  and  Malah  [1]  show  that  at  high  SNR,  their  derived  sup¬ 
pression  rule  approaches  the  Wiener  suppression  rule: 


Hk 


6 

1  +  6' 


(16) 


Although  not  immediately  obvious  upon  inspection  of  (5),  this  re¬ 
lationship  is  easily  seen  in  the  MMSE  spectral  power  suppression 
rule  given  by  (15),  expanded  slightly  to  the  following: 


Hk  = 


(17) 


As  the  instantaneous  SNR  7 k  becomes  large,  (17)  may  be  seen  to 
approach  the  Wiener  suppression  rule  given  by  ( 1 6).  As  it  becomes 
small,  the  1/7*,  term  in  (17)  lessens  the  severity  of  the  attenuation. 
Cappe  [4]  makes  the  same  qualitative  observation  concerning  the 
behaviour  of  the  Ephraim  and  Malah  suppression  rule,  although 
the  simpler  form  of  the  MMSE  spectral  power  estimator  shows  the 
influence  of  the  a  priori  and  a  posteriori  SNR  more  explicitly. 

Lastly,  we  note  that  the  success  of  the  Ephraim  and  Malah 
suppression  rule  is  largely  due  to  the  so-called  'decision-directed 
approach’  for  estimating  the  a  priori  SNR  6  [4].  For  a  given  short- 
time  block  l,  the  decision-directed  a  priori  SNR  estimate  6  is 
given  by  a  geometric  weighting  of  the  SNR  in  the  previous  and 
current  blocks: 

2 

6  =  Q  +  (1  ~  Q)  max  [0, 7 k(l)  -  1] ,  ot  G  [0,1). 

(18) 


It  is  instructive  to  consider  the  case  in  which  £k  =  yk  —  1;  i.e., 
a  =  0  in  (18)  so  that  the  estimate  of  the  a  priori  SNR  is  based 
only  on  the  current  block.  In  this  case  the  MMSE  spectral  power 
suppression  rule  given  by  (17)  reduces  to  the  method  of  power 
spectral  subtraction  (see,  e.g.,  [2]).  Figure  5  shows  a  comparison 
of  the  derived  suppression  rules  under  this  constraint. 


5.  CONCLUSION 

Herein  we  have  presented  a  derivation  and  comparison  of  three 
simple  alternatives  to  the  Ephraim  and  Malah  MMSE  spectral  am¬ 


Fig.  5.  Optimal  and  derived  suppression  rules 


plitude  estimator.  These  may  be  implemented  where  increased  ef¬ 
ficiency  is  desired,  and  each  may  be  coupled  with  hypotheses  con¬ 
cerning  uncertainty  of  speech  presence,  as  in  [1,2],  Moreover,  the 
form  of  the  MMSE  spectral  power  suppression  rule  given  by  (17) 
provides  a  clear  insight  into  the  behaviour  of  the  Ephraim  and 
Malah  solution,  and  in  particular  its  connection  to  simpler  sup¬ 
pression  rules. 
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ABSTRACT 

Through  this  paper  a  new  Spectral  Distance  Measure  is 
introduced  and  its  properties  explained.  This  measure  is 
especially  designed  to  evaluate  distances  between  spectral 
densities,  and  presents  important  properties,  as  the  invariance  to 
scaling  factors,  or  shifts  in  amplitude.  The  measure  may  be  used 
as  a  Test  for  Whiteness,  to  determine  the  similarity  between 
independent  processes,  or  to  check  the  Quasi-stationarity 
condition  in  a  single  process.  Its  special  ability  to  detect  spectral 
similarities  may  be  exploited  for  Speech  Segmentation  and  in  the 
detection  of  Speech  under  strong  noise  levels,  and  may  be  used 
in  End-point  Detection  applications.  The  fundamentals  of  the 
measure  are  given,  some  case  studies  are  described  and  the 
results  discussed. 


1.  INTRODUCTION 

One  of  the  main  problems  in  Noise-Robust  Speech  Recognition  is 
the  detection  of  speech  boundaries  when  the  Signal-to-Noise 
Ratio  is  rather  low,  such  that  energy-based  criteria  can  not  be 
used.  The  inaccurate  detection  of  speech  gives  incorrect 
information  to  Speech  Recognizing  Engines,  thus  increasing 
recognition  errors.  Good  algorithms  for  end-point  detection  and 
speech  segmentation  have  been  designed  and  developed 
throughout  the  last  years,  using  energy-based  thresholding, 
stochastic  spectral  detection,  neural  networks,  autocorrelation 
techniques,  etc.  [2][1][5][8][9].  In  a  previous  work  Adaptive 
Lattice-Ladder  Filters  have  also  been  successfully  used  to 
determine  speech  boundaries  [6].  Through  this  work  a  different 
method  based  on  the  application  of  a  Spectral  Distance  Measure 
(SDM)  supported  by  Statistical  Test  Theory  will  be  exposed  [3]. 

For  the  formulation  of  the  SDM  the  following  considerations  will 
be  made.  Assume  two  auto-regressive  processes  (x„)„eZ  and 
(y,Xez  defined  as: 

P. V 

xn  +  y  I-Vi-J  =  £jr.H  (N 

i=l 

Py 

y, i  ■*"  N.ay  i  y,i—i  =  fvvi  G) 

i=l 

where  £,„  and  £v  „  are  two  independent  and  identically  distributed 
random  stochastic  processes  with  zero  mean  and  variances  given 

by  cr2  and  cr2  respectively.  Let  cpjco)  and  (py<co)  be  their 


respective  positive  spectral  densities  on  0<co<2n,  where  co  is  the 
angular  frequency: 

2 

hupfco)  =  — — — —  (3) 

KH' 

<t: 

2ncp,(  co)  = - 1 — -  (4) 

\Py((0\' 

PJco)  and  Py( co)  being  respectively  the  transfer  functions  in  the 
domain  of  c  associated  with  both  processes  evaluated  on  the 
unity  circle,  i.e.: 

PJz=e>a)  =  I  +  t^.ke-jk<°  <51 

k=l 

Pdz=Pa)  =  1  +  Y.ay,ke-jkl°  (6) 

k=t 

p  being  the  maximum  order  of  both  processes,  setting  the 
corresponding  non-existing  coefficients  of  the  lowest-order 
process  to  zero:  p=max(pK,p,}. 

Let  p(co)  denote  the  positive  ratio  between  both  spectral 
densities: 


p(co)  = 


cpjco) 

cpy(co) 


17) 


The  SDM  test  is  based  thence  in  checking  whether  this  ratio  is 
constant  or  not  over  the  frequency  span.  In  other  words,  if  the 
ratio  is  near  to  constant  for  all  the  frequencies  considered,  one 
may  conclude  that  both  autoregressive  processes  are  identical 
except  for  a  factor  explainable  as  the  SNR  of  the  noise  input 

processes  (given  by  <72  /cr2  ). 


The  proposed  measure  may  be  then  formulated  as: 


D(p)  =  D{ cpj  co),  cpj  co)}  =  logf(l/2x)  j*^p(  coldcoj  - 


-  (If2n)  P  log  p<  co  )dco  (8) 

J-7T 

It  may  be  shown  that  D  fulfils  the  following  properties: 

•  ( PI ):  D(p)  >0  for  all  p(  co):  0<co<2n. 
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•  (P2):  D( Xp)  =  D(p)  for  any  real  positive  number  A. 

•  (P3):  D(p)  =  0  iff  p(co)  =  constant  almost 
everywhere. 

Property  (PI)  follows  directly  from  Jenssen's  inequality. 
Property  (P2)  says  that  D  is  invariant  to  a  change  in  the  scale 
factor  between  the  power  distribution  (variance)  of  both 
processes.  Property  (P3)  is  by  far  the  most  important  one,  as  it 
establishes  that  if  the  ratio  between  the  power  distribution  of  two 
processes  is  almost  constant  they  may  be  considered  as  generated 
by  the  same  system  fed  with  two  white  noise  processes  with 
different  variances,  thus  separating  the  contribution  of  the 
systems  from  that  of  their  inputs  in  the  overall  behavior  of  both 
power  distributions. 

From  property  (P3)  an  absolute  measure  of  the  distance  from  a 
given  process  x„  with  respect  to  a  hypothetical  white  noise 
process  £,,  with  spectral  density  given  by  <J /2n=constant  may  be 
implemented  just  imposing  <py=l  in  (7),  this  constituting  a  Test 
for  Whiteness  of  process  x„: 

DJ (pj  =  log{(l/2n)  f'T  <px(  (o)d(oj- 

•»—  It 
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b)  Log.  Spectral  Density 


Figure  1.  Two  sliding  and  non-overlaping  windows  N 
samples  long  separated  by  M  samples  to  grant 
independency  are  used  to  define  processes  x„  and  y„. 


-  ( l/2n)  J*  log  fx(  co  )dco  (9) 

where  Z>„.  establishes  the  distance  of  the  considered  process  x„ 
with  respect  to  randomness.  This  is  the  essential  theory 
underlying  the  methodology  proposed.  Through  the  rest  of  the 
paper  this  methodology  will  be  detailed  (Section  2),  and  its 
application  for  Speech  Segmentation  (Section  3)  and  End-Point 
Detection  (Section  4)  will  be  commented.  A  brief  discussion 
(Section  5)  will  extract  the  most  relevant  conclusions  on  the 
applicability  of  the  referred  technique. 

2.  TEST  METHODOLOGY 

Through  the  present  paper  the  applicability  of  the  SDM  for  the 
detection  of  speech  in  noise  is  sought.  Without  a  lose  of 
generality  it  will  be  considered  that  the  processes  to  be  compared 
are  extracted  from  the  same  process,  namely  a  signal  frame  as 
given  in  Figure  1.  To  implement  the  test  two  sliding  windows 
will  be  used,  each  of  one  being  N  samples  long,  separated  by  an 
interval  of  M  samples.  There  may  be  two  possible  techniques 
used  to  obtain  an  estimation  of  the  spectral  densities  of  the  traces 
in  the  sliding  windows: 

a)  To  use  the  squared  absolute  value  of  the  FFT  applied  on  the 
Hamming  windowed  N-sample  trace. 

b)  To  use  the  spectral  envelope  defined  by  the  p-th  order  Auto¬ 
regressive  Model  of  the  N-sample  trace. 

This  last  possibility  is  being  implied  in  the  example  given  in 
Figure  1.  In  the  examples  given  throughout  the  paper  an  N -point 
FFT  was  used  instead. 


A  trade-off  must  be  established  regarding  the  size  N  and 
separation  M  of  both  windows  to  preserve  the  quasi-stationarity 
conditions  on  one  side,  and  to  grant  independency  on  the  other. 
It  is  well  established  in  Speech  Processing  and  Recognition  that 
quasi-stationarity  is  well  preserved  with  time  windows  not 
exceeding  10  msec.  This  compromise  could  be  used  both  to 
establish  M  and  N,  in  the  sense  that  they  should  not  be  large 
enough  to  exceed  the  number  of  samples  allowed  by  the 
sampling  frequency  to  keep  the  time  intervals  below  the 
mentioned  value.  In  the  examples  shown  the  time  windows  were 
N=220  samples  long  for  a  sampling  frequency  of  f=22050,  and 
the  separation  interval  M=0,  therefore  the  sliding  windows  are 
contiguous  and  non-overlaping.  Another  important  aspect  to  be 
considered  is  the  possibility  of  carrying  out  the  test  sample  by 
sample  or  by  blocks  of  samples.  This  aspect  will  have  a  direct 
expression  on  the  computational  costs  of  the  method.  Block 
processing  will  be  used  in  the  examples  presented.  In  principle 
the  tests  are  devised  to  establish  the  similarity  between  processes, 
or  the  whiteness  of  a  given  process.  Therefore,  if  it  may  be 
assumed  that  noise  contaminating  speech  is  purely  white.  It  will 
be  quite  simple  to  detect  where  there  is  speech  present  just 
checking  a  given  segment  of  a  record  against  itself,  as  described 
in  Section  1.  For  non-white  noise  the  comparison  of  near¬ 
neighbor  segments  of  a  signal  supposedly  containing  speech 
would  also  give  interesting  hints  to  end-point  segmentation, 
assuming  that  contaminating  noise  is  quasi-stationary.  In  the  case 
that  this  condition  can  not  be  granted,  the  measure  may  be  used 
to  detect  the  degree  in  which  a  given  process  is  far  apart  from  the 
quasi-stationary  condition,  adding  a  new  feature  to  the 
capabilities  of  the  SDM. 

3.  SPEECH  SEGMENTATION 

The  first  experiment  to  be  presented  is  oriented  to  measure  the 
spectral  variations  inside  a  clean  speech  frame.  For  the  analysis 
an  utterance  of  the  word  /matrdana/  (apple)  was  used.  Figure  2.a 
shows  the  time-domain  speech  trace.  Its  spectrogram  using  a 
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220-point  FFT  is  being  shown  in  Figure  2.b.  The  so  called 
Global  Distance  Measure  summarizing  the  most  important 
spectral  changes  is  given  in  Figure  2.c. 


a)  Speech  Trace 
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Time  Window  =  5  msec. 


function,  as  they  may  be  hints  to  the  possible  presence  of 
speech.  One  such  example  is  given  in  Figure  3. 
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Figure  2.  a)  Speech  Trace  for  /man  tkina/.  b)  Associated 
Spectral  Density  from  200-point  FFT.  c)  Global  Distance 
Measure. 

It  may  be  observed  that  each  time  an  important  change  in  the 
spectral  density  takes  place  the  Global  Distance  Measure 
increases,  to  decrease  again  immediately  after.  The  peaks 
determine  quite  reasonably  the  spectral  changes.  Therefore  the 
speech  frame  could  be  divided  into  different  sectors  of  quasi¬ 
stationary  behavior.  The  most  dramatic  change  takes  place 
around  templates  54  and  55,  and  corresponds  apparently  to  the 
transition  inside  a  given  vowel  [a]  from  a  non-nasal  character  in 
[■da]  to  a  nasalized  coloring  in  the  articulation  of  [an].  Another 
important  transition  takes  place  around  window  71-72  when  the 
nasalization  ends. 

4.  END-POINT  DETECTION 

Another  important  problem  to  be  treated  using  the  technique 
described  is  End-point  Detection  for  speech  traces  in  the 
presence  of  strong  noise  levels.  Under  this  assumption  three 
different  cases  may  be  studied: 

a)  The  contaminating  noise  is  stationary  and  white.  In  this  case 
the  Spectral  Distance  Measure  may  be  very  efficient,  as  it 
may  detect  the  presence  of  speech  using  two  combined  hints: 
where  the  spectral  density  changes  from  a  white  case  to  a 
non-white  case  using  (9)  on  a  single  sliding  window,  or 
where  there  is  a  spectral  change  when  comparing  two  sliding 
windows  using  (8). 

b)  Noise  is  colored  and  quasi-stationary.  In  this  case  spectral 
changes  will  be  spotted  using  two  sliding  windows  and  (8). 

c)  Noise  is  colored  and  non-stationary.  This  is  the  worst  case,  as 
there  will  be  no  means  to  infer  if  the  spectral  changes  are  due 
to  hidden  speech  or  to  noise  spectral  changes.  Even  in  this 
case  it  will  be  useful  to  detect  changes  in  the  spectral  density 


Figure  3.  a)  Noisy  Speech  Trace.  Actual  SNR  is  under  - 
5  dB.  The  underlying  speech  trace  corresponding  to  an 
utterance  of  the  isolated  word  /eight/  can  not  be  seen,  b) 
Associated  Spectral  Density  showing  the  presence  of 
strong  noise  components,  c)  Global  Distance  measure 
pinpointing  the  possible  presence  of  speech. 

The  Noisy  Speech  Trace  was  recorded  under  strong  noise  levels 
(racing  car  noise  above  95  dB  SPL),  corresponding  to  isolated 
words.  Under  these  conditions  neither  the  time  signal  nor  the 
associated  spectrogram  allow  to  infer  the  presence  of  speech. 
Nevertheless  the  Global  Distance  Measure  in  Figure  3.c  gives 
specific  points  where  important  spectral  changes  take  place,  these 
being  around  windows  46  and  57.  Other  smaller  changes  are 
spotted  for  templates  around  73,  82  or  106.  To  see  whether  this 
detection  corresponds  with  actual  speech  segments,  a  speech 
enhancement  technique  previously  developed  [7]  and  [4]  was 
used.  The  results  may  be  seen  in  Figure  4  to  be  contrasted  against 
the  data  in  Figure  3. 


a)  Enhanced  Speech  Trace 


Time  Window  =  5  msec. 

Figure  4.  a)  Enhanced  Speech  Trace  using  the  technique 
described  in  [7]  and  [4].  b)  Associated  Spectral  Density 
showing  the  spectrogram  of  the  word  /eight/. 
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In  fact,  it  may  be  seen  that  within  window  frames  46  and  58  there 
is  a  speech  frame  present,  corresponding  to  the  isolated  word 
embedded  in  noise. 


Figure  5.  Matrix  corresponding  to  the  measure  of  each 
window  with  respect  to  other  windows  in  the  speech 
trace,  showing  clearly  the  boundaries  in  the  spectrogram 
corresponding  to  the  spotting  factor  given  in  Figure  3.c. 

5.  DISCUSSION 


To  study  the  consistency  of  the  results  an  exhaustive  test  was 
carried  out  cross-contrasting  the  spectral  density  estimates 
corresponding  to  the  whole  set  of  windows  in  the  speech  frame 
of  Figure  3  among  themselves,  the  corresponding  results  being 
given  as  a  matrix  D^: 

D,j  =  Dip,);  0<i<Nw-l;  0<j<Nw-l  (10) 

where  Nw  is  the  total  number  of  time  windows  used  in  the 
analysis  described.  As  it  would  be  reasonable,  the  main  diagonal 
of  the  matrix  D  is  Du=0,  as  this  distance  measure  is  zero  when 
using  (8).  It  is  important  to  remark  that  the  matrix  is  non- 
symmetric  respect  to  the  main  diagonal,  because  from  (7): 


II 

a 

(ID 

therefore  the  condition  that: 

Dij  =  Dji 

(12) 

will  signal  the  areas  where  the  spectral  density  is  white.  The  most 
important  property  of  the  Cross-Distance  Matrix  is  that  its 
columns  will  spot  the  spectral  changes,  as  they  will  show  a 
correlation  among  a  given  window  W,  and  all  the  other  windows 
Wp  0<j<N„.-l.  The  lighter  columns  (those  keeping  higher 
distances  with  respect  to  the  other  windows)  will  signal  the 
spectral  boundaries.  It  may  be  seen  that  there  is  a  perfect 
correspondence  between  these  boundaries  and  the  ones  given  in 
Figure  3.c. 

There  are  many  other  fields  of  application  for  the  Spectral 
Distance  Measure  presented,  as  for  example  in  random  signal 
whitening.  Most  random  generators  do  not  produce  true  white 
traces,  although  they  are  very  much  used  in  noise-staining 


experiments.  Using  the  measure  introduced  here  a  pre-produced 
trace  may  be  whitened  to  approach  a  flat  spectral  density  with  a 
rippling  error  of  less  than  0.01  dB.  Long  white  series  may  be 
produced  this  way  with  important  practical  applications,  such  as 
in  sound  equipment  calibrations  and  others  similar. 
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ABSTRACT 

With  the  growing  capacity  demand  in  wireless  communi¬ 
cation  systems,  space  division  multiplexing  and  space-time 
processing  by  means  of  antenna  arrays  are  becoming  ever 
more  attractive  as  a  technology  to  improve  the  system  per¬ 
formance,  especially  for  reduction  of  multipath  effects.  This 
paper  presents  a  new  low  complexity  and  high  accuracy  al¬ 
gorithm  to  estimate  the  multipath  delays  and  direction  of 
arrivals  (DOAs)  simultaneously  in  wireless  communication 
systems.  By  using  separable  dimension  correlation  process¬ 
ing,  the  temporal  and  spatial  signal  subspaces  are  formed 
and  the  joint  two  dimensional  delay/DOA  estimation  prob¬ 
lem  is  separated  into  two  simpler  one  dimensional  estima¬ 
tions. 

1.  INTRODUCTION 

Three  major  performance  and  capacity  limiting  impairments 
in  current  mobile  communication  systems  are:  multipath 
fading,  intersymbol  interference  (ISI)  and  co-channel  inter¬ 
ference  (CCI).  Especially,  the  ISI  impairment  resulting  from 
delay  spread  constrains  the  maximum  data  rate.  Current 
mobile  communication  systems,  using  temporal  processing 
alone,  cannot  effectively  address  these  impairments.  By 
means  of  an  antenna  array,  a  combination  of  temporal  and 
spatial  processing  can  potentially  yield  good  performance 
improvements  over  existing  systems.  Several  joint  delay 
and  direction  estimation  algorithms  for  signals  in  multipath 
environments  have  thus  been  developed  recently  [1, 2, 3]. 

This  paper  proposes  a  new  low  complexity  and  high  ac¬ 
curacy  algorithm  based  on  a  separable  dimension  subspace 
method  [7]  to  estimate  the  multipath  delays  and  direction  of 
arrivals  (DOAs)  simultaneously.  With  separable  dimension 
processing,  a  joint  spatial  and  temporal  estimation  problem 
is  separated,  i.e.,  the  delays  are  first  estimated  by  using  a 
one-dimensional  subspace  method  and  then  the  DOAs  are 
estimated  for  each  estimated  delay.  In  this  way,  the  com¬ 
putational  complexity  of  the  proposed  method  is  reduced 
while  its  performance  for  the  joint  delay/DOA  estimation  is 


still  satisfied  as  supported  by  computer  simulations. 

2.  PROBLEM  FORMULATION 

Consider  a  base  station  receiving  array  composed  of  M  an¬ 
tennas  and  assume  that  the  single  user  signal  of  interest  ar¬ 
rives  at  the  base  station  via  D  paths,  with  the  DOA  of  the 
ith  path  denoted  as  0,  (i  =  1, 2,  •  •  •  ,  D ).  Then,  the  received 
complex  baseband  signal  vector  at  the  antenna  array  can  be 
described  as: 

D 

x(t.)  =  a(A)  A  r(t  -  Tj )  +  n (t)  (1) 

#=1 

where  a (0j)  is  an  M  x  1  spatial  steering  vector  for  the  ith 
path,  0j  is  the  complex  fading  factor  of  the  ith  ray,  r(t)  is  a 
transmitted  complex  baseband  signal,  r,  is  the  i,h  path  prop¬ 
agation  delay  and  n (t)  is  a  spatially  and  temporally  white 
additive  Gaussian  noise  with  zero  mean  and  equal  covari¬ 
ance  a\ . 

In  a  linear  time-invariant  system,  the  transmitted  signal 
r(t)  can  be  represented  as  a  convolution  of  the  data  bits  and 
a  pulse  shaping  function  g(t).  i.e.  r(t)  =  J2i  si  -0(<  — /Tg). 
Therefore,  by  passing  the  signal  vector  x(£)  through  a  set 
of  tapped-delay  lines  (TDL)  of  length  Q  and  delay  To,  as 
shown  in  Figure  1.  and  sampling  the  resulting  outputs,  a 
data  matrix  X[n]  is  formed  as: 

X[7?]  =  [xiT[n]  x2T[n]  XA rT[n]]T 

d  (2) 

=  X!  a(#;)  A  ®  [G(t,)  •  •  •  G(r,  +  LT,)]s[n]  +  N[n] 

i=l 

where  symbol  <g>  denotes  the  Kronecker  product,  G(r, )  = 
[g{to  -  Tj),  g(to  -  To  -  T,),  •  •  •  ,  g(t()  -  (Q  -  1)T0 
— t;)]t  is  referred  to  as  the  temporal  manifold,  g(-)  is  the 
pulse  shaping  function,  which  models  the  total  impulse  re¬ 
sponse  of  the  filters  used  in  the  system,  t0  is  the  sampling 
reference  time  of  the  n* h  data.  s[n]  =  [s(n)s(n—  1)  ■  •  •  s(n— 
L)]t  is  a  vector  consisting  of  L  +  1  consecutive  symbols, 
Ts  is  symbol  duration  and  L  is  the  length  of  the  channel, 
which  covers  the  range  of  delays  {-TyJfij.  We  assume  that 
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a  training  sequence  is  embedded  in  the  transmitted  signal; 
the  training  portion,  represented  here  by  s[n],  can  be  ex¬ 
tracted  by  receiver  and  is  assumed  to  be  known.  Typically, 
in  TDMA  systems,  L  =  Lg  +  Tmax/Ts,  where  LgTs  is 
the  duration  of  the  pulse  shaping  function  g(t)  and  Tmax  is 
the  maximum  integer  delay  [4].  Likewise,  in  DS-CDMA 
systems,  L  =  2 J\fc,  where  A fc  =  Tcs/Tc,  Tcs  is  the  data 
symbol  period  and  Tc  is  the  chip  duration  [5], 


Xn  (»> 


Fig.  1.  Array  front-end  with  TDLs 


The  equation  (2)  can  be  rewritten  as: 

g(to  -  n ] 


X[n]  =  £  a(^i)  Pi 

i=  1 


L  g(to  -  (Q  -  i)T0  -  n)  \ 


+  N[n] 


=  [a(0i)  a(fe)]o[G(n)  G(td)] 
=  D (0,t)  B  +  N[n] 


W 


+  N[n] 


where  o  denotes  the  Khtari-Rao  product  (see  [6]),  which 
represents  column-Kronecker  product; 

Gfa)  =  [G(n)  •••  G(n  +  LTS)}  s[n] 


is  a  modified  temporal  manifold,  which  is  the  convolution 
between  the  training  sequence  and  delayed  shaping  func¬ 
tion; 


3.  SUBSPACE  PARTITION 

Assume  that  the  number  of  paths  D,  the  maximum  path 
delay  rmax,  the  array  response  a(-)  and  the  pulse  shaping 
function  g(-)  are  known.  Also  assume  that  the  complex  path 
fading  factor  {0i}fLi  remain  constant  during  a  data  symbol 
period. 

Define 

M 

T'h  =  E  E[xi(nTs  -  h  •  To)xf(n)], 

i=i 

h  =  0, 1,  •  •  •  ,£>-1 


=  b  "fc'  E [x,(nTs  -  h  •  T0)YH(nTs  -  h  •  To)] 

h= 0  (5) 

1  =  1,2,— ,D 

where 

Y (t)  =  [xi(t)  •  •  ■  XM(t)}T 


and  E(-)  denotes  mathematical  expectation.  We  refer  to 
{rft}^=o1  and  W^fLi  as  the  set  of  temporal  vectors  and 
spatial  vectors,  respectively.  It  can  be  shown  that  the  lin¬ 
ear  space  spanned  by  these  sets  of  vectors  are  equal  to  the 
range  space  of  G(r)  and  a(0),  respectively.  That  is,  let 

r  =  [r'i ,  *2>  •  •  •  ,  o\  and  V  =  [v'i,  V'2,  ‘  •  *  ,  v'D\’  then  we 

have 

^(r)  =  tt(G(r))  =  K(a(0))  (6) 

where  TZ(-)  denotes  the  range  space  of  its  matrix  argument, 
G(t)  =  [G(n)  G(rp)]anda(0)  =  [afa)  a(0D)]. 

4.  SEPARABLE  DIMENSIONAL  ALGORITHM 

Equation  (6)  suggests  that  a  subspace  method  may  be  used 
independently  to  estimate  DOA  and  delay  parameters.  Specif¬ 
ically,  we  may  use  correlation  processing  in  spatial  and  tem¬ 
poral  dimension  respectively  to  get  the  estimates  of  {rJjj^To 
and  {v'i}^  :  These  estimates  are  then  used  separately  to 
generate  null  subspace  projections.  Finally,  the  path  delay 
and  DOAs  can  be  estimated  with  subspace  methods  by  two 
one-dimensional  searches.  This  leads  to  the  following  algo¬ 
rithm: 

Step  1.  Formation  of  temporal  and  spatial  projection 
matrices 


D(0,r)  =  [a(0i)  a(0D)]o[G(n)  •••  G(td)} 

is  a  spatio-temporal  manifold  and  B  =  [/3i  /?2  •  •  •  Pd]t ■ 
Thus,  based  on  the  available  samples  {X[n]}^=1,  the 
problem  of  interest  here  is  to  estimate  the  DOAs  9i  and  the 
multipath  delays  r,  (i  =  1, 2,  •  •  •  ,D)  simultaneously  with 
subspace  methods. 


(1)  Estimation  of  temporal  vectors: 

1  M  i  JV 

^  =  mE^E  xi(nT°  -  h  ■  T°  )Xf(n)  (7) 

(=1  JV  n=  1 

-  $2eh)H,  h  =  0,  ,D-1  (8) 
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where  e/,  =  [0_-_0 10  •  ■  •  0]  and  a2  is  an  estimate  of  the 
h 

noise  variance. 

(2)  Gram-Schmidt  (GS)  orthogonalization  and  formation  of 

temporal  projection  matrix  Pr :  From  the  vectors  {?/, }  [  , 

we  can  get  D  orthogonal  vectors,  {q*.  }f=1  via  GS  orthog¬ 
onalization.  Let  Qr  =  [qi ,  qa,  •  •  •  ,  qoj.  then  compute  the 
temporal  projection  matrix  Pr  =  I  —  Q-Q^,  which  spans 
the  null  space  of  {G(7>)}jL] . 

(3)  Estimation  of  spatial  vectors: 

%  =  ^  E  jf  E  Xl{nT*  -  h  ■  To)YH(nTs  -  h  ■  T„) 


rji  =  (rj',  -  d2e,)H,  1  =  1,2,  •■•,£>  (10) 

(4)  Gram-Schmidt  (GS)  orthogonalization  and  formation 

of  spatial  projection  matrix  P$:  Via  Gram-Schmidt  orthog¬ 
onalization  of  {t ),}fLv  the  D  orthogonal  vectors,  {Ci  \IL\ 
and  spatial  orthogonal  projection  matrix  Pg  =  I  -  QoQ// 
are  obtained,  where  Qo=[Ci  C>  '  '  Cd]- 

Step  2.  Multipath  delays  and  direction  of  arrivals  esti¬ 
mation 

The  path  delays  {r*.}f=1  are  estimated  as  the  D  largest 
peaks  of  the  function  P(t)  =  (GH(r)PrG(T))_1 ,  search¬ 
ing  over  the  delay  sector  of  interest,  measured  by  symbol 
period  T.  Likewise,  the  DOAs  {#*  }f=]  are  estimated  by 
searching  over  the  direction  sector  of  interest  to  get  the  D 
largest  peaks  of  the  function  P(9)  =  (a//(0)Poa((?))“1. 

Step  3.  Delay  and  DOA  pairing 

With  estimated  delays  f i ,  f 2 ,  •  •  •  ,  to,  if  we  select  the  es¬ 
timated  DOAs  9j  ( i  =  1, 2,  •  •  •  ,  D)  to  minimize  the  cost 
function  £,  then  the  delays  i,  and  the  DOAs  9,  can  be  paired. 
The  cost  function  £  is: 

£  =  DH(0,f)E,!E"D(0\f)  (11) 

where  D  (9,  r)  =  a  (9)  o  G(f)  is  joint  spatial-temporal  vec¬ 
tor  and  E„  is  a  matrix  whose  columns  are  the  eigenvectors 
corresponding  to  the  smallest  eigenvalues  of  covariance  ma¬ 
trix  R*  =  E  [  X[n]  X"[n]  ]. 

5.  ESTIMATION  OF  UNKNOWN  NOISE 
COVARIANCE 

In  the  case  of  unknown  noise  covariance,  the  performance 
of  separable  dimension  subspace  estimation  method  will  be 
degraded.  In  equation  (7)  and  equation  (8),  we  can  see 
that  the  vector  ri,  will  be  noise  free  when  the  TDL  delay 


T0  is  greater  than  the  correlation  time  of  noise.  There¬ 
fore,  to  improve  the  estimation  performance  in  the  case  of 
unknown  noise  covariance,  we  can  change  the  length  of 
tapped-delay-lines  (TDL)  from  Q  to  2 Q  ( Q  >  D)  to  esti¬ 
mate  unknown  noise  covariance  a2  and  then  remove  it  from 
temporal/spatial  vectors. 

Let  the  first  Q  TDL  outputs  of  the  Ith  sensor  be  repre¬ 
sented  by  the  vector x/  =  [x/(nTs),x/(nTs-To),  •  •  •  ,x/(ttTs 
( Q  -  1)T0)]T  and  the  later  Q  TDL  outputs  by  the  vector 
x,  =  [xi{n.Ts-QT0),  xi{nTs-(Q  +  l)T0),  ■■■  ,x,(nTs- 
(2 Q  -  1)T0)]t.  Then,  the  sample  covariance  matrix  for 
cross  x/  and  x/  is 

1  N 

R,  =  —  ^[x,(n)  xi(n)H]  (12) 

71  =  1 

We  can  estimate  the  temporal  vector  17, 

M  N 

xi{nT* - - w)*,"^)  03) 

1  1=  1  1  71  =  1 

With  the  method  described  in  Section  4  and  the  esti¬ 
mated  vector  ih ,  we  can  estimate  the  multipath  delay  r, ,  i  = 
1,2,---  ,D  with  equation  P(t)  =  (Gn  (t)PtG(t))-1  . 

To  estimate  the  signal  covariance  matrix  Rs ,  we  can  re¬ 
construct  the  modified  temporal  manifold  G(f,)  and  G'(fv) 
with  the  estimated  delay,  f,,  herein  G'(fj)  =  [g{to  -  QTo  - 
T;),  •  •  •  ,  g(to  -  2 QT0  -  fi)]T,  and  have 

Rs  =  [G'i(f)G(T)]-1G"(r)R,G,(f)[G"'(f)G,(f)]-1 

(14) 

With  estimated  signal  covariance  matrix  R,s  and  G(f),  the 
noise  covariance  matrix  can  be  estimated  by 

<t2I  =  R;  —  G(f)RsG"(r)  (15) 

where  Rj  =  E[xi(n)xi(n)H], 

6.  COMPUTER  SIMULATIONS 

We  assume  that  a  transmitted  signal  with  D  =  3  paths  arrives  at  a 
linear  array  of  M  =  6  sensors  with  half-wavelength  spacing.  The 
multipath  delays  are  [0,  0.5,  1.2]  T,  T  =  1,  and  the  direction  of 
arrivals  are  [15°,  40°,  70°].  The  path  fadings  are  [1,0.85,0.8]. 
Additive  white  Gaussian  noise  is  added,  the  corresponding  SNR 
=  10  dB.  100  samples  are  accumulated.  The  pulse  shape  function 
is  a  raised  cosine  with  0.35  excess  bandwidth,  the  TDL  length  is 
Q  =  6  and  delay  To  =  0.5. 

Fig.  2  and  Fig  3  show  the  estimation  results  of  multipath  de¬ 
lays  and  DOAs  with  30  trials  by  using  the  proposed  separable  di¬ 
mension  subspace  method.  Simulations  and  performance  compari¬ 
son  with  other  proposed  algorithms  such  as  JADE  [2]  are  presented 
in  Fig.  4  and  Fig.  5. 
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7.  CONCLUSIONS 


DOA 


A  new  algorithm  based  on  separable  dimension  subspace  method 
is  proposed  for  joint  estimation  of  DOAs  and  multipath  delays  in 
the  wireless  communication  systems.  In  this  paper,  spatial  and 
temporal  separable  dimension  correlation  processing  are  used  to 
replace  EVD  (Eigen  Value  Decomposition)  or  S  VD  (Singluar  Value 
Decomposition).  Therefore,  compared  to  other  joint  DOA  and  de¬ 
lay  estimation  methods,  the  computational  complexity  of  the  pro¬ 
posed  method  is  relatively  small.  The  presented  algorithm  has 
been  tested  by  computer  simulation  studies  and  has  been  found 
to  perform  satisfactorily. 


Fig.  2.  Delay  estimation  with  30  trials 


Fig.  3.  DOA  estimation  with  30  trials 
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ABSTRACT 

Direction  finding  by  exploiting  the  directional  radia¬ 
tion  patterns  of  a  Switched  Parasitic  Antenna  (SPA)  is 
considered.  By  employing  passive  elements  (parasites), 
which  can  be  shorted  to  ground  using  pin  diodes,  di¬ 
rectional  radiation  patterns  can  be  obtained.  The  di¬ 
rection  finding  performance  of  the  SPA  is  examined 
by  calculating  a  lower  bound  on  the  direction  finding 
accuracy,  the  Cramer- Rao  lower  Bound  (CRB).  It  is 
found  that  the  SPA  offers  a  compact  implementation 
with  high-resolution  direction  finding  performance  us¬ 
ing  only  a  single  radio  receiver.  Thus,  exploiting  SPAs 
for  direction  finding  is  an  interesting  alternative  to  tra¬ 
ditional  antenna  arrays  offering  compact  and  low-cost 
antenna  implementations. 

1.  INTRODUCTION 

Direction  finding  is  of  great  importance  in  a  variety 
of  applications,  such  as  radar,  sonar,  communications, 
and  recently  also  personal  locating  services.  In  the  last 
two  decades,  direction  finding  and  sensor  array  process¬ 
ing  has  attracted  considerable  interest  in  the  signal  pro¬ 
cessing  community.  The  focus  of  this  work  has  been  on 
high  resolution,  i.e.  a  resolution  higher  than  the  width 
of  the  main  lobe,  Direction  Of  Arrival  (DOA)  estima¬ 
tion  algorithms  [3].  These  algorithms  exploit  the  fact 
that  an  electromagnetic  wave  that  is  received  by  an  ar¬ 
ray  of  antenna  elements  reaches  each  element  at  differ¬ 
ent  time  instants.  Although  the  performance  of  these 
systems  is  excellent,  an  unfortunate  aspect  is  the  high 
costs  of  employing  a  radio  receiver  for  each  antenna  el¬ 
ement.  Furthermore,  it  is  expensive  to  calibrate  and 
maintain  antenna  arrays  with  many  antenna  elements. 

Recently,  it  was  proposed  to  employ  an  SPA  for  di¬ 
rection  finding  [6,  7]  that  only  uses  a  single  active  radio 
receiver,  thereby  significantly  reducing  the  cost.  The 

JThis  work  was  supported  in  part  by  the  Swedish  Founda¬ 
tion  for  Strategic  Research,  under  the  Personal  Computing  and 
Communications  Program. 

2  This  work  was  supported  in  part  by  NUTEK. 


Mattias  Wennstrom 2 

Signals  and  Systems  Group 
Uppsala  University 
mw@signal .  uu .  se 

SPA  offers  characteristics  similar  to  an  array  antenna 
with  several  beams  by  using  passive  antenna  elements 
that  serve  as  reflectors  when  shorted  to  ground.  Dif¬ 
ferent  directional  patterns  can  be  achieved  by  switch¬ 
ing  the  short-circuits  of  the  passive  elements  using  pin 
diodes.  The  possibilities  of  exploiting  these  patterns 
for  high-resolution  DOA  estimation  will  be  examined  in 
this  paper,  since  no  attempt  to  employ  high-resolution 
DOA  methods  was  undertaken  in  [6,  7]. 

2.  SWITCHED  PARASITIC  ANTENNA 

Switched  Parasitic  Antennas  offering  directional  pat¬ 
terns  dates  back  to  the  early  work  of  Yagi  and  Uda 
in  the  1930’s  [1].  The  concept  is  to  use  a  single  ac¬ 
tive  antenna  element,  connected  to  a  radio  transceiver, 
in  a  structure  with  one  or  several  passive  antenna  el¬ 
ements,  operating  near  resonance.  These  passive  ele¬ 
ments  are  called  Parasitic  Element  (PE)s  and  act  to¬ 
gether  with  the  active  element  to  form  an  array,  as  in 
the  well  known  Yagi-Uda  array  [l].  To  alter  the  radi¬ 
ation  pattern,  the  termination  impedances  of  the  PEs 
are  switchable,  to  change  the  current  flowing  in  those 
elements.  The  PEs  become  reflectors  when  shorted  to 
the  ground  plane  using  pin  diodes  [8]  and  when  not 
shorted,  the  PEs  have  little  effect  on  the  antenna  char¬ 
acteristics.  The  receiver  is  always  connected  to  the 
center  antenna  element  so  there  are  no  switches  in  the 
RF  direct  signal  path. 

An  interesting  possibility  to  obtain  directional  in¬ 
formation  is  to  sample  the  received  signal  with  several 
different  radiation  patterns,  since  the  switching  time 
of  a  pin  diode  is  only  of  the  order  of  a  few  nanosec¬ 
onds.  This  technique  of  oversampling  the  received  sig¬ 
nal  is  common  in  many  communication  systems,  but 
here  the  oversampling  is  performed  in  both  time  and 
space,  i.e.  spatio-temporal  oversampling.  If  the  in¬ 
creased  sampling  rate  (or  bandwidth)  poses  a  problem, 
a  bandpass  sampling  strategy  could  also  be  employed. 
In  this  paper,  the  potential  in  using  the  different  ra¬ 
diation  patterns  of  an  SPA  for  direction  finding  will 
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Figure  1:  A  five  element  monopole  SPA.  The  center 
element  is  active  and  connected  to  the  transceiver.  The 
four  passive  antenna  elements  can  be  switched  in  or  out 
of  resonance  using  appropriately  biased  pin  diodes. 

be  examined.  However,  further  work  is  needed  on  the 
practical  aspects  of  the  antenna  design  as  well  as  sam¬ 
pling  strategies. 

In  the  literature,  it  has  been  proposed  to  use  mono¬ 
poles  on  a  ground  plane  [8]  or  patch  antennas  [5]  as 
SPAs.  In  this  paper,  a  monopole  on  a  ground  plane 
is  used  because  of  its  omnidirectional  properties.  A 
four  Direction  Symmetry  (4-DS)  monopole  parasitic 
antenna  is  shown  in  Figure  1  and  a  three  Direction 
Symmetric  (3-DS)  antenna  is  shown  in  Figure  2.  The 
antenna  in  Figure  2  has  an  additional  circle  of  para¬ 
sitic  elements  that  always  are  shorted  to  ground.  The 
effect  of  this  arrangement  is  an  increased  directivity  as 
their  length  are  shorter  than  the  corresponding  reso¬ 
nant  length  («  A/4)  and  will  lead  the  induced  Electro- 
Motive  Force  (EMF)  [1].  The  lengths  and  distances 
displayed  in  Figure  1  and  2  are  not  optimal  in  any 
way.  Note  that  the  resulting  antennas  are  very  com¬ 
pact  (A/4  &  A/2)  compared  to  corresponding  linear 
arrays  with  A/2  separation  distance  (2A  &  3A/2). 

The  antennas  in  Figures  1  and  2  were  simulated  us¬ 
ing  HFSS  (High  Frequency  Structure  Simulator)  from 
Agilent  Technologies  Inc.  which  is  a  3D  simulator  using 
the  Finite  Element  Method  (FEM)  to  solve  for  the  elec¬ 
tromagnetic  field.  The  software  was  used  to  calculate 
the  far-field  radiation  pattern  of  the  antenna  for  differ¬ 
ent  settings  of  the  switched  parasitics.  The  monopole 
elements  were  cylindrical  with  a  length  to  radius  ratio 
l/r  =  100  which  yields  a  first  resonance  at  approxi¬ 
mately  0.24A  [1]. 

The  far-field  power  radiation  pattern  in  the  azimuth 
plane  F(<p)  for  the  4-DS  SPA  is  shown  in  Figure  3.  The 
corresponding  pattern  for  the  3-DS  SPA  is  similar  and 
not  shown.  The  directivity  of  the  two  antennas  are  9.9 
dB  and  10.0  dB  respectively.  Thereby,  only  a  small 
gain  in  directivity  was  achieved  by  the  extra  ring  of 
shorted  parasites. 

Once  the  far-field  radiation  properties  are  found,  it 


Figure  2:  A  seven  element  monopole  SPA.  The  center 
element  is  active  and  connected  to  the  transceiver.  The 
three  passive  antenna  elements  closest  to  the  active  can 
be  switched  in  or  out  of  resonance  using  appropriately 
biased  pin  diodes.  The  three  outermost  monopoles  are 
hardwired  to  ground. 


210"  •  150° 

180° 

Figure  3:  Power  radiation  pattern  of  the  five  element 
monopole  antenna  shown  in  Figure  1  with  three  para¬ 
sitics  shorted  ( S )  to  ground  and  one  open  (O'). 

is  straightforward  to  derive  a  model  for  the  received 
voltages  [2],  If  p  waves  are  incident  upon  an  antenna 
with  M  symmetry  directions,  the  received  voltages  can 
be  written  in  matrix  form  as 

x(t)  =  A(4>)s{t)  +  e(t),  (1) 

where  the  vector  of  measured  voltages  x(f)  is  M  x  1. 
The  matrix  A (<f>)  (M  x  p)  corresponds  to  the  response 
of  the  different  symmetry  directions  and  has  elements 
[A (4>)\qr  =  F(<pr  +  2qn/M).  This  matrix  is  often  called 
steering  matrix  in  the  sensor  array  processing  litera¬ 
ture.  The  signal  vector  s (t)  is  p  x  1  and  contains  the 
strength  of  the  received  fields.  Finally,  the  noise  vector 
e(f)  is  M  x  1. 

In  order  for  the  analysis  in  the  following  sections  to 
be  valid,  some  additional  assumptions  are  needed: 

•  the  steering  matrix  has  full  rank,  i.e.,  rk(A)  =  p 

•  e(f)  is  temporally  white  and  circularly  Gaussian 
distributed:  e(t)  €  Af(0,  a2l) 
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Figure  4:  The  square  root  of  the  CRB  for  the  configu¬ 
rations  in  Figure  1  and  2  when  two  waves  are  incident 
from  (30°,  30°  +  A)  with  SNR=10dB  and  1000  samples. 


Figure  5:  The  square  root  of  the  CRB  for  the  configu¬ 
rations  in  Figure  1  and  2  when  two  waves  are  incident 
from  (00 .  So  +  5°)  with  SNR=10dB  and  1000  samples. 


•  s (t)  is  also  temporally  white  and  circularly  Gaus¬ 
sian  distributed:  s(t)  €  AA(0.  S) 

The  noise  is  both  spatially  and  temporally  white,  while 
the  signal  is  only  assumed  to  be  temporally  white.  Fur¬ 
thermore,  the  signal  is  assumed  to  be  uncorrelated  with 
the  noise. 


3.  DIRECTION  FINDING  PERFORMANCE 


The  data  model  (1)  is  identical  to  the  usual  data  model 
used  in  sensor  array  processing  [3],  except  for  a  new 
steering  matrix.  This  will  of  course  change  the  direc¬ 
tion  finding  properties.  Before  the  properties  of  a  spe¬ 
cific  DOA  estimation  scheme  is  studied,  a  lower  bound, 
the  Cramer-Rao  lower  Bound  (CRB),  on  the  variance 
of  the  DOA  estimates  will  be  analyzed.  Note  that  it 
is  possible  to  asymptotically  achieve  this  bound  with 
many  methods  in  the  literature  [3]. 

Expressions  for  the  CRB  was  derived  for  an  array 
of  antenna  elements  in  [4];  and  can  also  be  applied  to 
the  parasitic  antenna  by  changing  the  steering  matrix. 

E{(4>-4>0)  M>-<£0)T}>B  (2) 


B  =  ^  [Re{(D"PiD)  ©  (SAhR_1AS)t) 


where  the  elements  of  T>qr  = 


.  Fur- 

4>=tpr 


thermore,  ©  denotes  the  Hadamard  (or  Schur)  product, 
i.e.,  element-wise  multiplication  and  P^  =  I  —  P.4  = 
I  -  AAt  1  is  the  orthogonal  projector  onto  the  null 
space  of  AH .  The  matrix  R  =  ASAH  +  a2 1  is  the 


1  IvT  is  the  Moore- Penrose  pseudo  inverse  of  M. 


covariance  matrix  of  the  measured  voltages  x(t)  and  N 
denotes  the  number  of  time  samples. 

The  square  root  of  the  CRB,  i.e.  the  standard  de¬ 
viation,  is  shown  in  Figure  4  for  the  antenna  configura¬ 
tions  in  Figure  1  and  2  as  two  waves  are  incident  from 
(30°,  30°  +  A).  Only  the  CRB  for  the  first  DOA,  i.e. 
the  wave  arriving  from  30°,  is  shown  since  the  CRB 
for  the  second  DOA  will  behave  similarly.  The  stan¬ 
dard  deviation  for  a  uniform  linear  array  of  three  ele¬ 
ments  spaced  A/2  apart  is  compared  to  the  4-DS  and 
3-DS  SPAs.  As  expected,  the  performance  is  better 
when  using  four  rather  than  the  three  symmetry  direc¬ 
tions.  Also,  note  that  the  three  element  array  performs 
slightly  better  the  4-DS  SPA.  However,  these  results  de¬ 
pend  on  the  incidence  angles,  since  the  array  will  work 
best  for  broadside  and  worst  for  end-fire  incidence. 

In  Figure  5,  the  standard  deviation  is  shown  for  the 
same  antenna  configurations  as  in  Figure  4  when  two 
waves  are  incident  from  (0o-  <i>o  +  5°).  The  parasitic 
antenna,  due  to  its  symmetrical  properties,  offers  simi¬ 
lar  direction  finding  performance  properties  for  all  inci¬ 
dence  angles.  The  linear  array  performs  worse  than  the 
parasitic  antenna  at  end-fire  incidence,  while  perform¬ 
ing  much  better  at  broad-side  incidence.  However,  for 
many  direction  finding  applications,  the  direction  find¬ 
ing  performance  of  the  parasitic  antenna  is  sufficient 
and  the  cost  reduction  of  using  only  a  single  radio  re¬ 
ceiver  outweighs  the  loss  in  performance  for  broad-side 
angles.  It  should  also  be  stressed  that  the  antenna  de¬ 
signs  in  Figure  1  and  2  are  by  no  means  optimal  and 
better  DOA  properties  may  be  obtained  by  a  proper 
optimization. 
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4.  ESTIMATION  METHODS 


The  analysis  in  the  previous  section  was  based  on  the 
CRB  on  the  estimation  error.  In  this  section,  algo¬ 
rithms  that  approximately  achieve  this  lower  bound 
will  be  discussed.  In  principle,  all  DOA  estimation 
schemes  derived  for  a  general  antenna  array  can  also 
be  applied  to  a  parasitic  antenna  by  inserting  a  new 
steering  matrix.  For  an  overview  of  DOA  estimation 
methods,  see  [3]. 

In  [9],  a  popular  high  resolution  DOA  estimation 
method,  Multiple  Signal  Classification  (MUSIC),  was 
introduced  where  the  DOA  estimates  are  taken  as  those 
(f>  that  maximizes  the  MUSIC  criterion  function 


a(<£) 


<t>  =  argmax 

t  a*(</>)EnEna(^) 


(3) 


where  the  steering  vector  ag(0)  =  F(<f>+2qTr/M).  Usu¬ 
ally  this  is  formulated  as  finding  the  p  largest  peaks  in 
the  ” MUSIC  spectrum”.  Here,  E n  denotes  the  M  —  p 
eigenvectors  corresponding  to  the  M —p  smallest  eigen¬ 
values  of  the  estimated  covariance  matrix  R.  A  typical 
example  of  a  MUSIC  spectrum  is  shown  in  Figure  6, 
where  two  waves  are  incident  from  25°  and  45°  upon  a 
ADS  SPA  and  a  three  element  array  with  SNR=10dB 
and  1000  samples.  This  figure  indicates  that  the  SPA, 
in  this  case,  offers  a  high-resolution  direction  finding 
performance  similar  to  that  of  an  antenna  array  with¬ 
out  the  cost  of  many  radio  receivers.  Most  other  DOA 
estimation  schemes  [3]  can  also  be  applied  to  SPAs  with 
similar  results.  For  instance,  the  Stochastic  Maximum 
Likelihood  (SML)  algorithm  [4]  for  this  type  of  antenna 
was  implemented.  The  RMSE  of  the  ML  estimator 
achieved  the  CRB  bound  from  Section  3,  as  expected. 


5.  CONCLUSIONS 

The  potential  use  of  a  Switched  Parasitic  Antenna  for 
high- resolution  direction  finding  was  investigated.  By 
employing  passive  elements,  which  can  be  shorted  to 
ground  using  pin  diodes,  directional  radiation  patterns 
are  obtained  that  can  be  used  successfully  to  estimate 
DOAs.  The  main  advantage  with  this  concept  is  that 
only  one  radio  receiver  is  needed,  thereby  reducing  the 
costs  significantly  compared  to  traditional  antenna  ar¬ 
rays  where  one  radio  receiver  per  element  typically  is 
employed.  Another  advantage  of  the  SPA  is  that  a  very 
compact  implementation  of  the  antenna  is  possible. 

A  data  model  for  the  SPA  was  presented  and  the 
direction  finding  performance  was  examined  by  cal¬ 
culating  the  CRB  and  the  MUSIC  estimator.  It  was 
found  that  the  SPA  offers  a  compact  implementation 


Figure  6:  The  normalized  MUSIC  spectrum  when 
two  waves  are  incident  from  25°  and  45°  upon  a  4- 
DS  parasitic  antenna  and  a  three  element  array  with 
SNR=10dB  and  1000  samples. 

with  high-resolution  direction  finding  performance  us¬ 
ing  only  a  single  radio  receiver.  Thus,  exploiting  SPAs 
for  direction  finding  is  an  interesting  alternative  that 
offers  several  advantages  over  traditional  arrays. 
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ABSTRACT 

High  resolution  methods  such  as  MUSIC  fail  to  separate 
closely  spaced  sources  in  difficult  contexts  (low  SNR,  short 
sample  size,...).  Haider  et  al.(1997)  have  applied  an  inter¬ 
leaving  technique  to  improve  the  resolution  as  well  as  the 
performances  in  the  case  of  frequencies  estimation.  Here 
we  extend  this  work  and  deal  with  the  application  of  this 
technique  to  array  processing.  We  aim  to  estimate  closely 
spaced  DOAs.  After  a  first  estimation  with  MUSIC,  a  sec¬ 
ond  step  of  the  algorithm  consists  in  refining  the  angle  reso¬ 
lution  using  downsampled  covariance  matrices  together  with 
a  Joint  Estimation  Strategy  (JES)  similar  to  that  proposed  by 
Gershman  et  al.  (1996).  This  method  improves  MUSIC  per¬ 
formances  especially  for  low  SNRs.  Simulations  examples 
are  provided  to  illustrate  the  performance  of  the  proposed 
method  referred  to  as  Two  Step-MUSIC  (TS-MUSIC). 

1.  INTRODUCTION 

Direction  of  Arrival  (DOA)  estimation  is  a  recurrent  prob¬ 
lem  in  array  processing  that  can  be  treated  with  high  reso¬ 
lution  methods  such  as  MUSIC  (Multiple  Signal  Classifi¬ 
cation)  [4],  Unfortunately,  these  methods  are  less  efficient 
as  the  DOAs  come  closer.  Recently,  Haider  et  al.  have  sug¬ 
gested  a  temporal  downsampling  technique  that  improves 
the  frequencies  estimation  of  subspace-based  methods  [3]. 
The  effect  of  the  downsampling  is  to  artificially  increase 
the  separation  between  the  sources.  Here,  we  apply  this 
technique  in  the  case  of  DOA  estimation  by  replacing  tem¬ 
poral  downsampling  by  spatial  downsampling.  More  pre¬ 
cisely,  the  proposed  technique  combines  the  effects  of  spa¬ 
tial  downsampling  and  JES  to  further  improve  the  resolution 
of  MUSIC  for  closely  spaced  DOAs.  Section  II  presents 
the  data  model  and  problem  formulation.  In  section  III,  the 
TS-MUSIC  algorithm  is  introduced  and  discussed.  Section 
IV  provides  simulation  results  to  assess  the  performances 
of  TS-MUSIC  comparatively  to  that  of  MUSIC  algorithm. 
Concluding  remarks  are  given  in  section  V. 


2.  PROBLEM  FORMULATION 

In  the  following  we  consider  a  uniform  linear  array  (ULA) 
of  N  antennas  separated  by  half  a  wavelength1.  T  samples 
are  collected  on  each  antenna.  We  assume  d  plane  waves 
sources  impinging  on  the  array  from  angles  ,  ...,  0,/.  The 
received  signal  is  corrupted  by  additive  white  gaussian  noise 
and  is  expressed  as  [4]  : 

x(f)  =  As(t)  +  n(f)  t=l....T 

where  A  =  [a(0]  )....a(0f/)]  is  the  (N  x  d)  steering  matrix, 
a(0)  =  [1  sin(0)  ___  ej!rsin(0)(/V-l)jT  js  (jq  x  J)  vector 

steering  vector  toward  the  direction  0 ,  s  (t)  the  (d  x  1)  vector 
of  zero-mean  random  source  waveforms,  n(t)  is  the  (N  x  1) 
vector  of  white  zero-mean  sensor  noise  and  (.)7  denotes 
the  transposition  operator.  The  following  assumptions  on 
the  model  are  considered  to  hold  throughout  this  work: 

•  A  j :  the  number  of  sensors  is  at  least  L  times  (L>  1  a 
positive  integer)  larger  than  the  number  of  sources. 

•  ,4->:  the  sources  number  is  unknown  (but  small  in 
comparison  with  N)  and  two  or  more  sources  are  closely 
located. 

Our  objective  is  to  improve  the  sources  detection  and  res¬ 
olution  performances  of  MUSIC  using  both  spatial  down- 
sampling  and  joint  estimation  strategy. 

3.  TS-MUSIC 

The  first  step  of  our  algorithm  consists  in  the  application  of 
the  standard  MUSIC  method  :  i.e.  singular  value  decompo¬ 
sition  (SVD)  of  the  covariance  matrix  of  received  signal  is 
processed,  the  noise  subspace  is  built  and  the  sources  angles 
are  estimated  by  minimizing  the  projection  of  the  steering 

1  The  results  in  this  paper  do  not  require  a  specific  geometry  of  the  array. 
We  assumed  ULA  model  just  to  simplify  the  notation. 
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Figure  1 :  example  of  downsampling  with  L=2 


vector  a(0)  on  the  noise  subspace  [5].  Now,  in  the  situa¬ 
tion  where  two  or  more  sources  are  closely  located,  MUSIC 
fails  in  separating  them  and  estimates  less  DOAs  than  ef¬ 
fective  sources.  The  following  is  proposed  to  improve  the 
resolution. 


3.1.  Spatial  Downsampling 


To  improve  the  detection  performances,  we  propose  to  use  a 
spatial  downsampling  and  re-apply  MUSIC  in  a  second  step 
to  the  downsampled  data. 

Let  L  be  the  downsampling  factor  (L>1)  and  M>d  be 
the  size  of  each  subarray  (see  figure  1  for  an  illustration  of 
spatial  downsampling  with  L=2). 

The  steering  vector  corresponding  to  the  kth  subarray 


is: 


a k{6)  =  (fe_1)8inW 


1 

gjTri  sin(0) 


gjn{M —1)L  sin(0) 


It  is  clear  that  the  downsampling  artificially  increases 
the  separation  between  the  sources  by  a  factor  L.  This  re¬ 
sults  to  improved  performances  of  the  method. 

Let  illustrate  our  algorithm  with  the  following  exam¬ 
ple  :  consider  the  case  where  we  have  two  sources  located  at 
0i  =  8  +  66i  and  02  =  0  +  66-2  with  \S8m\  Cl  m  =  1,2. 
We  can  then  write 


a  (0m) 


1 

gj27r  sin(0+50m) 


ej(N-l)7rs\n{0+60m) 


m  =  1,2 


Applying  MUSIC  in  this  context  will  result  in  one  angle 
estimate  8  (i.e.  MUSIC  fails  to  distinguish  between  the  two 
sources). 

In  a  second  step,  we  use  a  downsampling  factor  L  >  1 
and  the  previous  estimate  of  8  (i.e.  8)  to  separate  the  sources 
and  improve  the  estimation  of  their  respective  DOAs.  More 
precisely,  we  re-apply  MUSIC  algorithm  by  restricting  our 
search  in  the  vicinity  of  8  and  using  the  following  expres¬ 
sion  for  the  steering  vector : 


a  (89$ 


1 

xL  sin(0-f-00i) 


ej(M—l)Ln  sin(0-t-<50*) 


This  has  the  advantage  of  "virtually"  increasing  the  angle 
difference  between  8 i  and  8->  by  a  factor  of  L  which  leads 
to  a  better  resolution  of  the  two  angles. 

3.2.  Joint  Estimation  Strategy 

One  of  the  major  issue  of  high  resolution  methods  for  DOAs 
estimation  is  the  determination  of  the  number  of  sources. 
Here,  we  propose  to  use  a  technique  based  on  a  joint  esti¬ 
mation  strategy  following  the  same  spirit  as  the  one  given 
by  Gershman  and  Bohme  in  [1],  In  their  contribution  re¬ 
sampling  techniques  are  used  to  build  "artificially"  several 
trials  of  the  observations.  Then  different  methods  are  used 
to  estimate  DOAs  from  each  trials.  The  major  idea  of  their 
algorithm  is  that  these  methods  show  different  local  behav¬ 
ior2.  The  reason  is  that  DOA's  estimation  methods  are  noise 
sensitive  and  thus  their  local  behavior  is  different  in  each  es¬ 
timation  trial.  For  example,  considering  two  estimators  with 
comparable  performance,  one  can  always  find  some  trials 
where  the  first  one  resolves  the  sources  while  the  second 
does  not.  Using  this  JES,  the  number  of  sources  is  deter¬ 
mined  as  the  maximum  number  (best  case)  obtained  from 
the  different  estimation  trials.  We  have  adapted  this  tech¬ 
nique  in  the  following  manner: 

We  first  make  a  coarse  estimation  of  sources  number  and 
positions  using  standard  MUSIC  applied  to  the  global  array 
outputs.  If  q  peaks  appear,  this  leads  to  the  determination 
of  a  set  of  q  angular  intervals  in  which  the  real  angles  lie. 
For  example,  each  interval  can  be  defined  by  the  -3dB  points 
around  those  peaks.  Now,  to  refine  the  search  we  place  us  in 
one  of  the  angular  sector  determined  before .  As  we  work  on 
subarrays  in  the  second  step  of  our  algorithm,  we  can  expect 
that  each  subset  of  antennas  can  lead  to  various  estimation 
behavior  because  noise  trials  are  different  on  each  antenna. 
So  we  can  perform  a  selection  on  the  subarrays  to  get  the 
best  estimates. 

In  our  case,  if  L  is  the  interleaving  factor  value,  we  can 
obtain  L  different  DOAs  estimates.  Then  among  these  L 
sets  of  estimates,  we  can  keep  those  that  give  the  highest 
number  of  peaks  in  the  angular  sector  being  scanned. 

The  algorithm  can  be  summarized  as  follows: 

First  step: 

1.  Coarse  estimation  of  the  number  q  of  sources  by  a 
conventional  method  like  beamforming  or  MUSIC. 

2.  Definition  of  the  set  of  intervals  in  which  the  refined 
search  will  be  performed: 

U  i^lefu^'right] 
i—1 

2  Local  behavior  refers  to  the  instantaneous  performance  of  any  estima¬ 
tion  algorithm  achieved  in  a  single  trial  without  any  statistical  averaging. 
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where  Ojefl  and  0'right  are  the  left  and  right  bounds  of 
the  angular  sector  corresponding  to  the  ith  DOA. 

Second  step : 

3.  Application  of  the  interleaving  method  leading  to  L 
subarrays  outputs  (L  trials). 

4.  On  each  trial  apply  MUSIC  algorithm  where  the  num¬ 
ber  of  sources  in  each  angular  sector  is  selected  as  the 
number  of  peaks  of  MUSIC  spectrum3  observed  in 
this  sector.  Each  peak  argument  corresponds  to  an 
estimate  of  a  source  angle. 

5.  The  number  of  sources  in  each  angular  sector  is  cho¬ 
sen  as  the  maximum  number  of  sources  detected  from 
the  L  subarrays. 

6.  In  each  angular  sector,  select  the  p  subarrays  (trials) 
that  lead  to  the  maximum  number  of  sources  in  this 
interval  and  after  sotting  the  angles,  we  compute  the 
final  angle  estimates  as  their  averaged  values,  i.e. 


subarray  1 

ioo  cp  ooooocp 

subarray  2 

Figure  2:  two  subarrays  spaced  by  L=3  sensors  inter-spaces. 

•  A  major  limitation  of  TS-MUSIC  is  the  restrictive 
condition  N  >  Ld  with  L  >  1.  An  alternative  solu¬ 
tion  would  be  to  use  in  the  second  step  of  the  method 
ESPRIT  algorithm  [4]  with  2  subarrays  spaced  by  L 
sensors  inter-spaces  (see  figure  2).  In  this  case,  con¬ 
dition  A i  (§  1)  becomes  N  >  d  4-  L  which  is  much 
less  restrictive  than  the  previous  one.  Furthermore,  a 
resampling  technique  can  be  used  to  apply  the  JES  to 
further  improve  the  resolution. 


where  0j\  1=1  ...p  represent  the  p  estimates  of  Oj  from 
p  different  subarrays. 

3.3.  Discussion 

We  discuss  here  the  possible  extensions  of  the  algorithm  and 
give  some  observations  on  our  method: 

•  We  have  applied  array  downsampling  plus  JES  to  im¬ 
prove  MUSIC  resolution.  Other  standard  localization 
techniques  [4]  can  be  used  as  well  and  improved  in 
the  same  way.  Also,  it  is  always  possible,  as  in  [1, 2], 
to  use  different  estimation  methods  on  each  subarray 
and  combine  the  results  according  to  the  JES. 

•  The  proposed  method  does  not  necessarily  improve 
the  estimation  accuracy  of  the  DOA  but  only  their  res¬ 
olution  when  they  are  closely  spaced.  The  reason  is 
that  in  the  second  step  of  TS-MUSIC,  subarrays  (L 
time  smaller  than  the  global  array)  are  used  to  esti¬ 
mate  DOAs  which  may  deteriorate  the  estimation  ac¬ 
curacy  (see  figure  6  for  illustration). 

•  The  method  can  be  further  improved  by  applying  both 
spatial  downsampling  and  temporal  resampling,  e.g. 
bootstrap,  on  each  subarray  to  increase  the  number  of 
estimation  trials. 

3We  limit  our  angle  search  in  the  second  step  of  the  algorithm  to  the 
angular  sectors  previously  computed. 


4.  SIMULATION  EXAMPLE 

To  illustrate  the  performance  improvement  achieved  by  our 
method,  we  consider  a  simple  example  of  d=2  equipower 
sources  impinging  on  the  array  from  9\  =22°  and  62  = 
9i  +  SO,  SO  being  a  small  angle  difference.  The  ULA  is  con¬ 
stituted  of  N=  10  antennas.  The  sample  size  is  set  to  T=100. 
The  results  that  we  obtained  are  based  on  1000  indepen¬ 
dent  Monte  Carlo  experiments.  We  have  applied  both  MU¬ 
SIC  and  TS-MUSIC  for  comparison.  In  figure  3,  we  plot 
the  resolution  probability  (i.e.  the  percentage  of  success¬ 
ful  detection  of  the  exact  sources  number)  versus  the  angle 
difference  for  a  SNR  of  5  dB  and  a  downsampling  factor 
L=3.  We  can  see  that  for  low  SNRs,  TS-MUSIC  achieves 
a  higher  rate  of  successful  source  separation  in  comparison 
with  MUSIC.  For  example,  for  an  angle  difference  of  3°  the 
resolution  probability  of  MUSIC  was  less  than  10%  while  it 
is  of  50%  for  TS-MUSIC.  In  figure  4,  we  plot  the  resolution 
probability  versus  the  SNR  for  an  angle  difference  SO  —  2° 
and  a  downsampling  factor  L=3.  A  significant  improvement 
is  obtained  in  terms  of  resolution  probability  for  low  and 
moderate  SNRs.  In  figure  5,  we  represent  the  histograms  of 
the  number  of  sources  detected  by  MUSIC  and  TS-MUSIC 
for  a  SNR  of  0  dB  and  different  angular  distances.  In  figure 
6,  we  plot  the  DOAs  MSE  (mean  square  error)  against  SNR 
for  L=2  and  an  angle  difference  of  SO  =  8°  (we  chose  here  a 
situation  where  both  MUSIC  and  TS-MUSIC  achieve  a  cor¬ 
rect  source  separation).  We  note  that  accuracies  of  the  esti¬ 
mation  with  MUSIC  and  TS-MUSIC  are  very  close.  This  il¬ 
lustrates  the  fact,  in  such  context,  that  the  proposed  method 
only  improves  the  resolution  but  not  necessarily  the  estima¬ 
tion  accuracy. 
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MSE 


SNR,  L=2,100  trials,  1 0  antennas) 


Detection  rate  (SNR  =5  dB,  L=3,  T=100,  1000  trials) 


Figure  3:  Detection  rate  (in  %)  vs.  50 


Figure  4:  Detection  rate  (in  %)  vs.  SNR 

SNR=5dB,  L=2 


delta=2 


delta=2.75 


dotta=3.25 


de*a=4.25 


MUSIC  TS-MUSIC 


SNR  (dB) 


Figure  6:  MSE  vs.  SNR 


5.  CONCLUSION 

We  have  introduced  a  new  method  TS-MUSIC  based  on  the 
combination  of  spatial  downsampling  and  a  joint  estimation 
strategy  to  improve  array  resolution.  Spatial  downsampling 
artificially  increases  sources  separation  and  provides  several 
subarrays  (i.e.  trials)  whereas  JES  enables  a  selection  of  the 
best  estimates  that  are  obtained  with  the  different  subarrays. 

6.  REFERENCES 

[1]  Alex  B.  Gershman  and  Johann  F.  Bohme.  Joint  estima¬ 
tion  strategy  with  application  to  eigenstructure  meth¬ 
ods.  8th  IEEE  Signal  Processing  Workshop  on  Stastical 
Signal  and  Array  Processing ,  June  24-26  1996. 

[2]  Alex  B.  Gershman  and  Johann  F.  Bohme.  Improved 
doa  estimation  via  pseudo-random  resampling  of  spatial 
spectrum.  IEEE  Signal  Processing  Letters,  4(2):54-57, 
february  1997. 

[3]  Bijit  Haider  and  Thomas  Kailath.  Efficient  estima¬ 
tion  of  closely  spaced  sinusoidal  frequencies  using 
subspace-based  methods.  IEEE  Signal  Processing  Let¬ 
ters,  Vol.  4(num.  2):p.  49-51, 1997. 

[4]  Don  H.  Johnson  and  Dan  E.  Dudgeon.  Array  Signal 
Processing  :  Concepts  and  Techniques.  Prentice  Hall, 
1993. 

[5]  R.  O.  Schmidt.  Multiple  emitter  location  and  signal  pa¬ 
rameter  estimation.  Proceedings  of  the  RADC  Spectral 
Estimation  Workshop,  Rome(N-Y):243-258, 1979. 


Figure  5:  Estimated  sources  number  histograms 


515 


OPTIMIZATION  OF  ELEMENT  POSITIONS 
FOR  DIRECTION  FINDING  WITH  SPARSE  ARRAYS 


Fredrik  Athley 

Department  of  Signals  and  Systems 
Chalmers  University  of  Technology 
SE-412  96  Goteborg,  Sweden 
athley@s2.chalmers.se 


ABSTRACT 

Sparse  arrays  are  attractive  for  Direction-Of-Arrival  (DOA)  esti¬ 
mation  since  they  can  provide  accurate  estimates  at  a  low  cost.  A 
problem  of  great  interest  in  this  matter  is  to  determine  the  element 
positions  that  yield  the  best  DOA  estimation  performance.  A  ma¬ 
jor  difficulty  with  this  problem  is  to  define  a  suitable  performance 
measure  to  optimize.  In  this  paper,  a  novel  criterion  is  proposed 
for  optimizing  element  positions.  The  ambiguity  threshold  of  the 
Weiss-Weinstein  Bound  (WWB)  is  used  to  optimize  the  element 
positions  of  a  sparse  linear  array.  The  array  obtained  from  the  op¬ 
timization  is  compared  with  some  other  sparse  array  structures  that 
have  been  proposed  in  the  literature. 

1.  INTRODUCTION 

Direction-Of-Arrival  (DOA)  estimation  using  an  array  of  sensors 
finds  application  in  many  fields,  such  as  radar,  sonar,  communica¬ 
tions  etc.  Over  the  past  decades,  there  has  been  intense  research  in 
this  area,  see  e.g.  [1]  and  the  references  therein.  The  DOA  estima¬ 
tion  accuracy  is  critically  dependent  on  the  array  size.  Large  arrays 
can  thus  provide  very  accurate  estimates.  DOA  estimation  with  ar¬ 
rays  with  many  elements  are,  however,  expensive  to  implement, 
both  in  terms  of  receiver  hardware  and  computational  complexity. 

For  non-ambiguous  DOA  estimation  with  Uniform  Linear  Ar¬ 
rays  (ULAs),  the  inter-element  spacings  should  not  exceed  half 
a  wavelength  of  the  impinging  wavefronts.  In  sparse  arrays,  ele¬ 
ments  are  spaced  further  apart  in  order  to  obtain  a  large  aperture 
with  few  elements.  Sparse  arrays  thus  have  the  potential  of  very 
accurate  DOA  estimation  at  a  low  cost.  The  price  paid  is  the  risk 
of  obtaining  ambiguous  estimates,  caused  by  grating  lobes  in  the 
array  beam  pattern.  To  reduce  such  grating  lobes,  non-uniform 
element  spacing  is  employed.  An  important  problem  is  then  to 
determine  which  element  positions  yield  the  most  accurate  DOA 
estimates. 

Different  approaches  in  optimizing  the  element  positions  with 
respect  to  DOA  estimation  accuracy  have  been  taken  in  the  lit¬ 
erature.  In  [2,  3],  the  element  positions  of  Non-Uniform  Linear 
Arrays  (NULAs)  were  optimized  by  minimization  of  the  Cramer- 
Rao  Bound  (CRB).  A  problem  with  this  approach  is  that  the  CRB 
is  a  local  bound  that  does  not  take  into  account  large  estimation  er¬ 
rors  caused  by  near  ambiguities.  For  the  single  signal  problem  this 
means  that  only  the  curvature  of  the  mainlobe  is  considered;  high 
sidelobes  have  no  effect.  At  low  Signal-to-Noise  Ratios  (SNRs) 
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these  sidelobes  may  cause  large  estimation  errors,  rendering  the 
CRB  a  far  too  optimistic  bound  in  this  case. 

Various  approaches  have  been  proposed  to  account  for  near 
ambiguities.  In  [4]  the  mainlobe  area  was  minimized  subject  to  a 
peak  sidelobe  constraint  and  in  [5]  competitive  criteria  involving 
maximum  aperture  and  identifiability  were  considered.  Although 
these  approaches  are  intuitively  appealing,  there  is  no  explicit  con¬ 
nection  between  these  ambiguity/aperture  trade-offs  and  the  result¬ 
ing  mean  square  estimation  error. 

In  this  paper,  another  approach  is  taken.  A  lower  bound  on 
the  mean  square  estimation  error  that  takes  ambiguity  errors  into 
account  is  used  to  optimize  the  element  positions  of  a  NULA  with 
fixed  aperture.  The  bound  used  is  the  Weiss-Weinstein  Bound 
(WWB),  which  was  first  presented  in  [6]  and  subsequently  applied 
to  DOA  estimation  in  e.g.  [7,  8,  9,  10]. 

2.  PROBLEM  FORMULATION 

Consider  a  linear  array  of  K  sensors  receiving  a  single  planar 
wavefront  from  the  DOA  9  measured  relative  to  the  array  bore- 
sight.  For  mathematical  convenience,  the  estimation  of  u  =  sin  9 
is  considered.  The  element  positions,  denoted  by  <4 ,  k  =  1 , . . . ,  K 
are  normalized  by  the  standard  spacing  A/2  where  A  is  the  wave¬ 
length,  i.e.  dk  —  2(4  /A  where  <4  is  the  physical  distance.  In  the 
sequel,  different  linear  array  geometries,  keeping  the  array  length 
D  (normalized  by  A/2)  fixed,  will  be  studied.  Without  loss  of  gen¬ 
erality,  the  end  elements  d\  and  dx  are  fixed  at  0  and  D  respec¬ 
tively.  Assuming  an  ideal  array  with  omnidirectional  elements,  the 
array  output  at  time  t  can  be  modeled  by  the  K  x  1  complex  vector 

x(()  =  a(u)s(<)  +  n(f),  f  =  l,...,AT  (1) 

where 

a(w)  =  [l  e-j7,d2U  ...  e-j”d’<-'u  e~j*Du  ]T 

is  the  K  x  1  array  steering  vector.  Furthermore,  s(t)  denotes  the 
impinging  signal  at  baseband,  n(f)  is  an  additive  noise  term  and 
N  denotes  the  number  of  temporal  snapshots.  The  signal  s(t)  and 
noises  n (t)  are  assumed  independent  and  are  modeled  as  white 
(spatially  and  temporally),  zero  mean,  circular  complex  Gaussian 
random  variables  with  second  order  moments 

E[|s(<)|2]  =SNR  and  E  [n(t)nH(t)j  =  I,  (2) 

The  signal  variance  is  thus  equal  to  the  Signal-to-Noise-Ratio  per 
space-time  sample  since  the  noise  variance  has  been  normalized  to 
unity. 
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The  problem  considered  in  this  paper  is,  given  the  noisy  ob¬ 
servations  x(f),  t  =  1, . . . ,  N,  to  determine  the  element  positions 
d?, . . . ,  d*r-i  that  maximize  the  DOA  estimation  performance.  A 
crucial  issue  that  is  addressed  herein  is  how  to  define  DOA  estima¬ 
tion  performance  when  the  estimation  is  prone  to  ambiguities. 

3.  ESTIMATION  PERFORMANCE  MEASURES 

Often,  DOA  estimation  performance  is  evaluated  by  means  of  the 
CRB.  This  bound  is  relatively  easy  to  compute  but  is  a  local  bound 
that  does  not  take  into  account  large  errors  that  may  be  caused  by 
near  ambiguities.  Various  global  bounds  have  been  proposed  in 
the  literature.  These  are  more  tedious  to  compute  but,  on  the  other 
hand,  they  provide  insights  into  how  ambiguity  errors  affect  the 
overall  estimation  error.  One  such  bound  is  the  Weiss- Weinstein 
Bound  (WWB)  [6].  This  is  a  lower  bound  on  the  Mean  Square 
Error  (MSE)  that  rests  on  the  Bayesian  framework  of  estimation. 
This  means  that  the  parameter  of  interest  is  considered  to  be  a  ran¬ 
dom  variable  with  known  prior  distribution.  Throughout  the  paper, 
a  uniform  distribution  on  [—1, 1]  is  assigned  to  u.  For  details  con¬ 
cerning  the  computation  of  the  WWB  for  DOA  estimation,  see  [7]. 

To  illustrate  the  difference  between  the  WWB  and  the  CRB, 
Figure  1  shows  the  CRB  and  WWB  as  a  function  of  SNR  for  a 
particular  NULA  with  8  elements1.  At  high  SNR,  the  WWB  and 
CRB  coincide  since,  in  this  region,  ambiguous  estimates  do  not 
occur.  Below  a  certain  SNR  threshold  the  WWB  increases  rapidly. 
At  this  threshold,  ambiguous  estimates  from  grating  lobes  begin  to 
yield  contribution  to  the  total  MSE  which  is  comparable  to  that  of 
the  mainlobe.  This  threshold  effect  is  not  captured  by  the  CRB. 
The  performance  measure  that  is  used  in  this  paper  for  finding  op- 


d  =  [0  1  211  IS  18  21  23],  N  =  1 6 


Fig.  1.  CRB  and  WWB  as  a  function  of  SNR  for  a  NULA. 


timal  element  positions  is  this  SNR  threshold.  An  array  with  a 
low  threshold  is  likely  to  provide  very  accurate  estimates  and  still 
be  robust  to  ambiguity  errors.  The  SNR  threshold  can  be  defined 
in  different  ways,  e.g.  where  the  WWB  exceeds  the  CRB  by  a 
certain  amount  or  the  maximum  of  the  second  derivative  of  the 
WWB  curve.  Since  no  analytical  expressions  for  the  SNR  thresh¬ 
old  are  derived  in  this  paper,  the  liberty  is  taken  to  identify  the  SNR 
threshold  simply  by  ocular  inspection  of  the  produced  graphs.  It  is 

’Since  the  WWB  is  a  Bayesian  bound,  a  direct  comparison  with  the 
CRB  is  not  meaningful  in  general.  However,  arguments  similar  to  those  in 
[10]  can  be  used  to  justify  such  a  comparison. 


likely  that  the  conclusions  of  the  present  paper  is  independent  of 
the  precise  definition  of  SNR  threshold. 

4.  OPTIMIZATION  OF  ELEMENT  POSITIONS 

The  basic  ideas  behind  the  optimization  procedure  are  as  follows: 

1.  Generate  a  large  number  of  different  arrays  with  random 
element  positions. 

2.  Compute  the  WWB  as  a  function  of  SNR  for  each  array. 

3.  Identify  a  reduced  set  of  arrays  with  the  lowest  SNR  thresh¬ 
olds. 

4.  The  element  positions  of  these  arrays  are  used  as  starting 
points  in  a  numerical  optimization  routine  to  improve  the 
best  arrays  from  the  previous  step. 

5.  The  optimal  element  positions  are  then  taken  from  the  best 
array  after  the  numerical  optimization. 

Importance  was  attached  to  analyzing  as  many  random  arrays  as 
possible  within  a  limited  computing  time.  Therefore,  a  somewhat 
simplified  procedure  was  implemented: 

•  The  WWB  as  a  function  of  SNR  was  computed  for  103 
different  arrays.  The  array  with  the  lowest  SNR  threshold 
of  these  arrays  was  identified.  This  array  had  a  threshold  at 
about  SNR  =  -5  dB.  Then,  the  WWB  at  SNR  =  -  5  dB  was 
computed  for  106  random  arrays. 

•  The  1 0  arrays  with  the  lowest  WWB  at  SNR  =  -  5  dB  were 
selected  for  numerical  optimization. 

•  The  element  positions  of  these  arrays  were  used  as  starting 
points  when  minimizing  the  WWB  with  respect  to  element 
positions  at  SNR  =  -5  dB.  The  “fminsearch”  routine  in  Mat- 
lab’s  Optimization  Toolbox  was  used  for  this  purpose. 

•  The  array  with  the  lowest  WWB  after  the  numerical  opti¬ 
mization  was  then  considered  to  be  the  optimal  array. 

There  is  no  guarantee  that  the  global  optimum  is  found  with 
this  procedure.  If  a  very  large  number  of  random  arrays  are  gener¬ 
ated,  however,  it  is  likely  that  the  obtained  solution  is  “sufficiently 
optimal”  in  any  practical  application. 

The  optimization  procedure  was  evaluated  by  generating  106 
eight-element  linear  arrays  with  random  element  positions  and  D  — 
23,  N  =  16.  The  element  positions  were  generated  according  to 
a  uniform  distribution  on  [0,  D\.  Figure  2  shows  the  WWB  as  a 
function  of  SNR  for  the  arrays  which  had  the  lowest  and  highest 
WWB  at  SNR  =  -5  dB.  In  order  to  show  the  statistical  nature  of  the 
WWB  of  the  randomly  generated  arrays,  there  is  also  a  histogram 
of  the  WWB  at  SNR  =  -5  dB  for  all  the  arrays  in  the  plot.  The  his¬ 
togram  has  been  rotated  90°  compared  to  the  standard  orientation 
of  a  histogram.  It  can  be  seen  from  the  figure  that  the  difference 
between  the  WWB  for  the  best  and  the  worst  array  is  quite  large. 
The  positions  of  the  array  elements  thus  have  a  great  influence  on 
the  attainable  estimation  performance.  The  element  positions  of 
the  10  best  arrays  were  then  used  as  starting  points  in  a  numerical 
optimization  routine  to  minimize  the  MSE  at  SNR  =  -5  dB.  Fi¬ 
nally,  the  optimal  element  positions  are  taken  from  the  array  with 
the  lowest  MSE  at  this  SNR.  The  numerical  optimization  reduced 
the  minimum  MSE  from  -44.8  dB  to  -45.2  dB. 

Hitherto,  the  element  positions  were  considered  as  continu¬ 
ous  variables.  This  implies  an  infinite  number  of  possible  arrays 
with  different  element  positions.  On  the  other  hand,  constraining 
the  element  positions  to  a  discrete  grid  leads  to  a  finite  number  of 
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Fig.  2.  WWB  for  the  best  and  worst  array  and  a  histogram  of 
WWB  at  SNR  =  -5  dB. 

possible  different  arrays.  Therefore,  it  should  be  possible  to  com¬ 
pute  the  WWB  for  all  these  possible  arrays  if  the  number  of  grid 
points  and  array  elements  are  not  too  large.  A  common  approach 
is  to  start  with  a  ULA  with  A/ 2  element  spacing  and  the  required 
length.  Then,  a  given  number  of  elements  are  removed  from  the 
full  array  in  order  to  produce  the  sparse  array.  These  arrays  are 
often  called  thinned  arrays.  In  the  present  example,  the  two  end 
elements  are  fixed.  Thus,  there  are  22  element  positions  to  choose 
6  positions  from.  The  number  of  different  ways  to  pick  6  elements 
out  of  22  is  equal  to  (262)  =  74613.  This  is  a  reasonable  number  of 
arrays  for  being  able  to  compute  the  WWB  for  all  of  these  arrays 
on  a  standard  PC.  The  WWB  at  SNR  =  -5  dB  was  computed  for 
all  these  74613  arrays  and  the  result  is  illustrated  in  Figure  3.  The 


Fig.  3.  WWB  for  the  best  array  using  continuous  and  discrete 
element  positions  respectively. 

WWB  vs  SNR  for  the  arrays  with  the  lowest  WWB  at  SNR  =  -5 
dB  using  continuous  and  discrete  positions  respectively  is  shown. 
Clearly,  the  difference  between  the  two  is  negligible. 

5.  COMPARISON  WITH  OTHER  ARRAYS 

The  arrays  obtained  from  the  optimization  procedure  described  in 
the  previous  section  were  compared  with  a  few  other  array  config¬ 
urations  that  have  been  studied  in  the  literature.  A  type  of  thinned 


array  that  has  been  widely  studied  is  the  so  called  minimum-redundancy 
array  [11].  Another  array  configuration  that  also  has  been  studied 
is  two  separated  subarrays  where  each  subarray  is  a  ULA  with 
A/2  inter-element  spacing,  see  e.g.  [9,  10].  The  array  geometry 
that  minimizes  the  CRB  for  NULAs  with  fixed  length  is  given  by 
two  point  clusters  at  the  array  end  points  [3].  Due  to  mutual  cou¬ 
pling  effects  and  mechanical  considerations,  the  element  spacing 
cannot  be  too  small.  The  separated  subarrays  configuration  can 
thus  be  viewed  as  a  realizable  approximation  of  the  CRB-optimal 
geometry.  Figure  4  shows  the  element  positions  for  the  arrays  un¬ 
der  consideration.  In  Figure  5,  the  WWB  of  the  best  thinned  ar- 
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Element  positions  [A/2] 

Fig.  4.  Element  positions 

ray  obtained  from  the  numerical  optimization  is  compared  with 
the  WWB  of  a  minimum  redundancy  array.  There  is  practically  no 


Fig.  5.  WWB  for  minimum  redundancy  array  and  the  best  thinned 
array. 

difference  between  the  WWB  for  the  two  arrays.  Therefore,  in  this 
example,  the  minimum  redundancy  array  can  be  considered  to  be 
the  optimal  thinned  array  with  respect  to  robustness  to  ambiguity 
errors.  Recall  that  the  difference  between  the  WWB  for  the  arrays 
obtained  from  minimization  over  continuous  and  discrete  element 
positions  respectively  was  very  small.  Therefore,  it  is  concluded 
that  the  minimum  redundancy  array  is  near  optimal  in  the  present 
example. 

Figure  6  shows  WWB  vs  SNR  for  the  array  obtained  from 
optimization  over  continuous  element  positions  and  the  separated 
subarray  structure.  The  separated  subarray  structure  has  a  consid- 
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Fig.  6.  WWB  for  optimized  array,  continuous  positions  and  sepa¬ 
rated  subarrays  respectively. 

erably  higher  SNR  threshold  and  somewhat  lower  WWB  at  high 
SNR  as  compared  to  the  optimal  array.  This  is  expected,  since  the 
separated  subarray  structure  has  a  narrower  mainlobe  but  higher 
sidelobes,  due  to  concentration  of  the  elements  near  the  array  end¬ 
points. 

Common  engineering  practice  suggests  that  low  sidelobes  are 
important  for  ambiguity-free  DOA  estimation.  In  order  to  investi¬ 
gate  the  adequacy  of  this.  Figure  7  displays  a  2-D  histogram  of  the 
WWB  at  SNR  =  -5  dB  and  the  peak  sidelobe  in  the  beam  pattern 
for  each  of  the  106  arrays  as  a  contour  plot.  Some  interesting  con- 


Peak  sidelobe  relative  mainlobe  [dB] 


Fig.  7.  2-D  histogram  of  WWB  at  SNR  =  -5  dB  and  peak  sidelobe, 
displayed  as  a  contour  plot. 

elusions  can  be  drawn  from  Figure  7.  For  most  arrays,  high  peak 
sidelobe  means  high  WWB.  A  few  ridges  can  be  discerned  in  the 
contour  plot.  These  are  probably  due  to  the  peak  sidelobe  being 
at  different  distances  from  the  mainlobe.  An  ambiguous  estimate 
from  a  sidelobe  far  from  the  mainlobe  gives  a  greater  contribu¬ 
tion  to  the  MSE  than  from  a  sidelobe  close  to  the  mainlobe.  A 
remarkable  property  appears,  however,  if  the  lowest  contour  of  the 
histogram  is  scrutinized.  Apparently,  if  only  the  best  arrays  are 
considered,  there  seems  to  be  no  relationship  between  the  peak 
sidelobe  and  the  WWB,  at  least  as  long  as  the  peak  sidelobe  does 
not  exceed  -3  dB  (relative  to  the  mainlobe).  It  is  concluded  that 
high  peak  sidelobe  does  not  necessarily  give  high  mean  square  es¬ 


timation  error,  if  the  element  positions  are  determined  judiciously. 
Inspection  of  the  corresponding  beam  patterns  revealed  that  for  the 
arrays  with  low  WWB  and  high  peak  sidelobe,  the  peak  sidelobe 
is  relatively  close  to  the  mainlobe. 

6.  CONCLUSIONS 

A  novel  criterion  for  optimizing  the  element  positions  of  sparse 
linear  arrays  has  been  presented.  The  criterion  used  was  the  am¬ 
biguity  threshold  of  the  Weiss- Weinstein  Bound  (WWB).  This  is  a 
lower  bound  on  the  mean  square  DOA  estimation  error  that  takes 
into  account  large  errors  caused  by  ambiguities.  An  optimization 
procedure  was  implemented  in  order  to  find  the  array  with  lowest 
ambiguity  threshold.  The  WWB  for  this  array  was  compared  with 
a  minimum-redundancy  array  and  a  separated  subarrays  structure. 
It  was  found  that  the  optimal  array  and  the  minimum-redundancy 
array  had  similar  performance.  Furthermore,  it  was  found  that  low 
peak  sidelobe  is  not  necessary  for  obtaining  lowest  possible  WWB. 
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HIGH  RESOLUTION  DF  WITH  A  SINGLE  CHANNEL 

RECEIVER 


Chong  Meng  Samson  See  * 


Abstract —  This  paper  presents  a  method 
for  high  resolution  multiple  source  direction 
finding  with  an  antenna  array.  Unlike  previ¬ 
ously  developed  algorithms,  the  proposed  ap¬ 
proach  can  achieve  high  resolution  direction 
finding  with  only  ONE  receiver,  thereby,  of¬ 
fering  significant  hardware  savings.  The  pro¬ 
posed  approach  requires  the  signal  received  by 
the  antenna  array  to  be  pre-processed  by  a 
beamformer  network  where  each  of  the  beam- 
former  output  ports  are  sequentially  sampled 
by  a  RF  switch.  As  the  power  of  each  beam- 
former  output  port  is  a  function  of  the  ar¬ 
ray  covariance  matrix,  we  derive  a  Kronecker 
form  that  leads  to  an  unique  least  squares  esti¬ 
mates  of  the  array  covariance  matrix  using  the 
power  measured  from  all  the  beamformer  out¬ 
put  ports.  With  the  array  covariance  matrix 
estimated,  conventional  high  resolution  DF  al¬ 
gorithms  can  be  applied  to  determine  the  di¬ 
rection  of  arrival  estimation  of  the  multiple 
sources  impinging  the  antenna  array. 

I.  Introduction 

High  resolution  direction  finding  algorithms  en¬ 
able  the  antenna  array  system  to  achieve  accu¬ 
rate  direction  of  arrival  estimation  in  the  presence 

"C.M.S.  Sec  is  with  DSO  National  Laboratories.  20  Sci¬ 
ence  Park  Drive,  Singapore  118230.  Tel:  065-8712423. 
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of  co-channel  and  multipaths.  Most  high  res¬ 
olution  direction  finding  (DF)  algorithms,  such 
as  MUSIC,  ESPRIT,  WSF  etc,  require  the  num¬ 
ber  of  receivers  to  the  number  of  antennas  to  be 
matched.  However,  these  antenna  array  process¬ 
ing  system  can  be  costly  to  implement  in  applica¬ 
tions  that  require  to  achieve  wide  instantaneous 
frequency  coverage  and  where  weight,  size  and 
volume  are  subjected  to  tight  constraints.  In 
[1],  a  method  for  high  resolution  direction  find¬ 
ing  is  proposed  which  allows  the  number  of  an¬ 
tenna  to  be  larger  than  the  number  of  receivers. 
While  the  number  of  receivers  needed  is  signifi¬ 
cantly  reduced  (minimum  of  two),  the  proposed 
method  requires  multi-dimensional  search  algo¬ 
rithms  to  determine  the  DOA  of  multiple  sources 
which  generally  are  computationally  demanding 
and  do  not  guarantee  global  convergence.  In  [2], 
the  application  of  computationally  efficient  MU¬ 
SIC  and  Capon's  beamformer  is  made  possible 
by  estimating  the  DOA  from  a  restricted  num¬ 
ber  of  antenna  outputs  (sub-array).  Later  in  [3], 
a  similar  approach  was  proposed  that  combined 
the  cost-function  of  each  sub-array  incoherently. 
Apart  from  poorer  estimation  performance,  the 
number  of  signal  sources  that  can  be  detected  and 
resolved  by  these  approaches  are  limited  by  the 
number  of  elements  in  the  sub-array.  Recently, 
an  approach  based  on  reconstructing  of  the  array 
covariance  from  the  sub-array  data  was  proposed 
in  [4].  The  significance  of  this  approach  is  that  it 
allows  the  direct  application  of  the  MUSIC  esti¬ 
mator. 

Direction  of  arrival  estimation  processing  ar¬ 
chitectures  using  only  one  receiver  channel  to 
sample  the  antenna  array  element,  such  as  [5], 
have  been  proposed.  With  the  sampling  and  an¬ 
tenna  switching  synchronized,  the  basic  tenet  of 
these  approaches  is  to  sequentially  sample  the  an¬ 
tenna  elements  as  fast  as  possible  such  that  the 
effect  of  sequential  sampling  of  the  antenna  can 
be  approximated  by  phase  shifts.  As  a  result, 
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the  received  signal  vector,  as  in  the  case  of  a  full 
channel  antenna  array  system,  can  be  approxi¬ 
mated  and,  therefore,  allowing  the  application  of 
computationally  efficient  algorithms  like  MUSIC 
estimator.  However,  such  approach  requires  the 
antennas  to  be  sampled  at  very  high  rates  and 
will  increase  proportionally  with  receiver  band¬ 
width  and  number  of  antennas.  It  was  recom¬ 
mended  in  [5]  that  the  sampling  rate  should  be  in 
excess  of  1GHz.  Unfortunately,  the  need  for  very 
high  speed  RF  switches  and  ADC  as  well  as  the 
ability  to  handle  the  large  amount  of  data  (due 
to  high  sampling  rate)  will  increase  the  cost  and 
complexity  of  the  DF  system,  hence,  diminishing 
the  gain  due  to  the  reduced  number  of  receivers. 

In  this  paper,  we  present  a  method  for  high  res¬ 
olution  DF  with  an  antenna  array.  Unlike  previ¬ 
ously  the  approach  proposed  in  [5] ,  it  can  achieve 
high  resolution  DF  of  multiple  signal  sources  with 
only  one  receiver  without  the  need  for  high  speed 
ADC  and  RF  switches.  As  shown  in  Figure  1, 
in  our  proposed  approach,  the  signals  received 
by  a  N  element  antenna  array  is  pre-processed 
by  an  analog  beamformer  network  with  M  out¬ 
put  ports.  The  M  channel  signals  are  sequen¬ 
tially  sampled  by  a  M  to  1  RF  switch.  The  sin¬ 
gle  channel  output  from  the  RF  switch  is  down- 
converted  by  a  single  channel  receiver  and  sam¬ 
pled  at  Nyquist  rate  by  an  ADC  for  digital  signal 
processing. 

II.  Proposed  Approach 

The  received  signal  power  associated  with  each 
beamformer  output,  Z{  (t),  is  estimated  from  the 
sampled  data  and  is  given  by 

Zi  ( t )  =  otf  r  (/•)  r  ( t)H  oti  +  rii  ( t ) 

where  Q;  is  the  vector  of  beamformer  weights  and 
rii  (t)  is  the  receiver  noise.  Without  any  loss  of 
generality,  we  assume  that  the  dominant  noise 
comes  from  the  receiver.  The  signal  received  by 
the  antenna  array,  r  (t) ,  is  given  by 

r(t)  =  A(©)s  (t) 

where  A(©)=  a(0j)  •••  a  (6d)  witha(0j) 

being  the  steering  vector  associated  with  angle 
of  arrival  Oi  and  s  (t)  is  the  source  waveform. 
The  measure  power  of  the  beamformer  outputs  is 


function  of  the  array  covariance  matrix  and  the 
beamformer  weight  vector: 

Zi  =  E  (zi  (<))  =  otf  R  (0)  a i 

where 

R  (0)  =  A  (0)  PA  (0)H  andP  =E  (s  (t)  s  ( t)H  \  . 

r  it 

By  writing  z  =  z\  ■  ■  ■  zm  ,  we  have 

z=diag(THR(©)r)  (1) 

where  T  =  aq  •  •  •  a^i  and  the  operator 
diag(C)  returns  a  vector  from  diagonal  terms  ex¬ 
tracted  from  C.  Given  (1),  we  relate  the  power  of 
all  the  beamformer  outputs  to  the  array  covari¬ 
ance  by 

z  =  Qvec(THR(©)r) 

=  q(rr®rH)Gp 

=  (^Re  +  j& Im)  P 

where  Q  is  a  selection  matrix  such  that 
diag(C)  =  Qvec(C),  p  is  a  real  valued  parameter 
vector  and  G  is  another  selection  matrix  (not¬ 
ing  that  R(0)  is  a  Hermitian  matrix)  such  that 
G  p  =  vec(R(©)).  The  operator  ®  denotes  Kro- 
necker  product  and, 

pRe  =  Re  (q  (rT  @  rH)  g) 

and 

nim  =  im  (q  (rT®rH)  g). 

With  sufficiently  large  number  of  beamformer 
outputs  M  >  N,  the  least  squares  estimates  of 
the  array  covariance  matrix,  R  (0)  or  p,  can  be 
obtained  by  evaluating 


where  fljm  .  Once  the  array  co- 

variance  matrix  is  reconstructed  from  p,  high  res¬ 
olution  DF  algorithms,  such  as  MUSIC,  can  be 
used  to  estimate  the  DOA  of  the  signal  sources. 
It  is  important  to  point  out  that  the  signal  power 
of  each  beamformer  port  can  be  estimated  in  fre¬ 
quency  or  in  time  domain.  When  wideband  re¬ 
ceiver  is  used  to  achieve  wide  instantaneous  spec¬ 
trum  coverage,  the  signal  power  at  the  frequency 
of  interest  can  be  estimated  in  the  frequency  do¬ 
main  using  Fast  Fourier  Transform. 
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A.  A  Numerical  Example 


In  this  example,  we  consider  a  4  element  antenna 
array  uniformly  spaced  circular  array  with  radius 
of  0.8A,  where  A  is  the  wavelength  of  the  sig¬ 
nals  impinging  the  antenna  array.  The  analogy 
beamformer  network  has  16  output  ports  and  the 
weights  of  each  beamformer  port  are  randomly 
generated.  Figure  2  depicts  a  MUSIC  spectrum 
computed  using  the  array  covariance  matrix  es¬ 
timated  by  the  proposed  method.  Two  uncorre¬ 
lated  signal  impinge  the  antenna  array  with  SNR 
of  5dB  from  70  and  180  as  indicated  by  the  red 
dotted  lines.  The  number  of  snapshots  used  to 
estimate  the  power  at  each  beamformer  output 
Zi  is  1000.  As  shown  in  Figure  2.  the  proposed 
method  is  able  to  resolve  and  estimate  the  source 
location  accurately. 

III.  Concluding  Remarks 

Apart  from  achieving  significant  hardware  sav¬ 
ings  by  enabling  high  resolution  DF  with  only 
ONE  receiver,  the  proposed  method  also  offers 
the  following  advantages: 

1.  As  it  derives  the  array  covariance  matrix 
from  the  beamformer  output  power,  the 
sampling  of  the  beamformer  outputs  can  be 
done  at  very  low  rate.  Hence,  only  low  speed 
RF  switches  will  be  needed  here. 

2.  It  does  not  require  the  received  signals  to  be 
oversampled  and  the  ADC  sampling  rate  is 
only  low  bounded  by  Nyquist  rate. 

3.  When  only  ONE  receiver  is  used,  the  DF 
processor  based  on  the  proposed  method 
does  not  require  receiver  calibration.  This 
will  further  simplify  and  reduce  the  cost  of 
hardware. 


Figure  1  -  Proposed  Architecture. 

[2]  K.M.  Buckley,  X.L.  Xu,  “Recent  Advances 
in  High  Resolution  Spatial  Spectrum  Esti¬ 
mation",  Ptoc.  of  EUSIPCO-90 ,  Barcelona, 
Spain.  Sep.  1990,  pp.  17-25. 

[3]  J.  G.  Worms,  “RF  Direction  Finding  with  a 
Reduced  Number  of  Receivers  by  Sequential 
Sampling”,  IEEE  Conference  on  Phased  Ar¬ 
ray  Systems  and  Technology,  May  2000. 

[4]  Fishier,  E.;  Messer,  II.,  “Multiple  source  di¬ 
rection  finding  with  an  array  of  M  sensors 
using  two  receivers  ”,  Statistical  Signal  and 
Array  Processing.  2000.  Proceedings  of  the. 
Tenth  IEEE  Workshop  on.  ,  2000.  pp.  86  -89. 

[5]  US  Patent  5,497,161,  “Angle  of  Arrival  Solu¬ 
tion  using  a  single  receiver”. 


As  seen  from  these  advantages,  the  proposed 
approach  can  offer  low  cost  and  parsimonious  DF 
architecture  for  high  resolution  direction  of  ar¬ 
rival  estimation. 
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Figure  2  -  MUSIC  Spectrum. - :  True  DOA. 

M  =  20.  N  =  4,  Number  of  snapshots  :  1000. 
Array  Geometry:  Uniform  circular  array,  radius 
=  0.8  wavelength. 
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ABSTRACT 

By  exploiting  the  spatial  and  temporal  properties  of  most 
communication  signals,  we  propose  a  forward-backward  linear 
prediction  (FBLP)  approach  for  estimating  the  directions-of- 
arrival  (DOA)  of  coherent  signals  impinging  on  a  uniform  linear 
array.  In  the  proposed  method,  the  evaluation  of  the  cyclic  array 
covariance  matrix  is  avoided  and  the  difficulty  of  choosing  the 
optimal  lag  parameter  is  alleviated.  As  a  result,  the  proposed 
method  has  two  advantages:  the  computational  load  is  relatively 
reduced  and  the  robustness  of  estimation  is  significantly 
improved.  It  is  shown  through  numerical  examples  that  this 
approach  is  superior  in  resolving  closely  spaced  coherent  signals 
with  small  length  of  array  data  and  at  relatively  low  signal-to- 
noise  ratio  (SNR). 

1.  INTRODUCTION 

For  estimating  the  directions-of-arrival  (DOA)  of  multiple 
narrow-band  signals  from  the  noisy  array  data,  maximum 
likelihood  (ML)  methods  and  subspace-based  methods  are  well 
known.  Subspace-based  methods  such  as  MUSIC  [1]  and  MODE 
(in  a  uniform  linear  array  (ULA))  [2]  are  more  computationally 
efficient  than  the  ML  methods  [3],  but  all  of  them  except  MODE 
are  unsuitable  for  coherent  signals.  To  tackle  the  problem  of 
coherent  signals,  several  modifications  to  the  subspace-based 
methods  have  been  proposed.  Among  them,  spatial  smoothing 
(SS)  [4]  is  a  popular  preprocessing  scheme.  However,  in  array 
processing  of  wireless  communication  systems,  there  are  some 
practical  situations  where  the  overall  number  of  incident  signals 
is  greater  than  the  number  of  sensors  even  though  the  number  of 
desired  signals  is  smaller,  and  multipath  propagation  due  to 
various  reflections  is  often  encountered.  Furthermore,  the  number 
of  snapshots  is  usually  limited.  In  these  scenarios,  the 
performance  of  most  subspace-based  methods  and  their  variants 
will  degrade.  Moreover,  the  subspace-based  methods  basically 
rely  on  the  spatial  information  contained  in  the  received  data, 
whereas  the  temporal  properties  of  the  desired  incident  signals 
are  ignored. 

Most  communication  signals  exhibit  cyclostationarity  for  a 
given  cycle  frequency  because  of  the  underlying  periodicity 
arising  from  carrier  frequencies  or  baud  rates  [7].  Many  direction 
estimation  methods  exploiting  this  inherently  temporal  property 
have  been  developed  recently  (e.g.  [8]),  in  which  the  stationary 
noise  and  the  interfering  signals  that  do  not  share  a  cycle 
frequency  common  to  the  desired  signals  are  suppressed.  For 
estimating  the  directions  of  coherent  cyclostationary  signals,  a 
cyclic  ML  method  [9]  and  an  SS-based  cyclic  MUSIC  method 
[10]  were  proposed.  However,  the  former  is  computationally 
expensive  because  it  involves  a  multidimensional  optimization, 
while  the  latter  is  still  not  computationally  efficient  enough  since 
the  cyclic  correlation  matrices  of  subarrays  must  be  evaluated. 

In  this  paper,  by  utilizing  the  spatial  and  temporal  properties 
of  the  incoming  signals  impinging  on  a  ULA,  we  investigate  an 
efficient  method  for  estimating  the  directions  of  narrow-band 
cyclostationary  signals  in  a  multipath  propagation  environment. 


In  the  proposed  cyclic  method,  the  forward-backward  linear 
prediction  (FBLP)  model  is  incorporated  with  a  subarray  scheme, 
and  the  directions  of  the  desired  coherent  signals  can  be 
estimated  from  the  corresponding  prediction  polynomial.  In  this 
paper,  we  use  multiple  lags  to  exploit  the  cyclic  statistical 
information  efficiently  and  to  alleviate  the  difficulty  of  choosing 
the  optimal  time  lag.  For  achieving  the  best  performance  of  DOA 
estimation,  the  choice  of  the  subarray  size  (i.e.  the  order  of  the 
LP  model  plus  one)  is  considered.  We  derive  an  analytical 
expression  of  error  variance  of  spectral  peak  position  by  using 
linear  approximation  for  sufficiently  high  signal-to-noise  ratio 
(SNR)  and  clarify  the  optimal  subarray  size  for  minimizing  the 
peak  position  variance.  As  a  result,  the  proposed  method  has  two 
advantages:  the  computational  load  is  relatively  reduced  and  the 
robustness  of  estimation  is  significantly  improved.  The 
performance  of  the  proposed  approach  is  verified  through 
numerical  examples. 

2.  PROBLEM  FORMULATION 

2.1  Data  Model  and  Assumptions 

We  consider  a  ULA  of  M  identical  and  omnidirectional  sensors 
with  spacing  d ,  and  assume  that  p  narrow-band  signals  {st(n)} 
with  zero-mean  and  center  frequency  fc  are  far  enough  away 
and  come  from  distinct  directions  (0, ) .  The  received  signal 
y,(n)  at  the  /th  sensor  can  be  expressed  by 

yt(n)  =  x,(n)+  w,(n)  (1) 

.r,(n)=  (2) 

w 

where  x,(n)  and  w,(n)  are  the  noiseless  received  signal  and 
additive  noise,  (O0=2 nfc,  rt (0)  =  (d/c)sin9k ,  and  c  is  the 
speed  of  propagation. 

The  received  signals  can  be  rewritten  in  a  compact  form  as 

y(n)  =  A{6)s(n)+w(n)  (3) 

where  y(n)  and  w(n)  are  the  A?  x  1  vectors  of  the  received 
signals  and  noise,  s(n )  is  the  pxl  vector  of  the  incident  signals, 
and  the  array  matrix  AW)  is  given  by  A(0)  =  [a(0,), 
a(0,  ),■■■, at 0,,)]  with  a(0,  )  =  [U><wr‘,9V-^°,M-|ml,,l]r  _ 

In  this  paper,  the  array  is  assumed  to  be  unambiguous. 
Without  loss  of  generality,  under  a  frequency-flat  multipath 
propagation  [4],  the  first  q  (\<q<p  and  q  <  2Mp )  signals  are 
coherent  ones  from  the  desired  source  expressed  by 
sk(n)  =  pks,(n),  where  /},  is  the  multipath  coefficient  which 
represents  the  complex  attenuation  of  the  fcth  signal  with  respect 
to  the  first  one  s,(n)  with  /3,  *  0  and  /3,  =  1 .  The  desired  source 
exhibits  the  second-order  cyclostationarity  with  the  cycle 
frequency  a ,  and  it  is  cyclically  uncorrelated  with  the  other 
signals  at  this  cycle  frequency.  The  noise  {w’,(n)}  are  cyclically 
uncorrelated  with  themselves  and  with  the  incident  signals  at  the 
considered  cycle  frequency  a  .  The  number  of  coherent  signals 
q  and  the  cycle  frequency  a  are  known  or  estimated  a  priori. 

2.2  Forward-Backward  Linear  Prediction  with  Subarrays 
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Here  we  consider  the  case  that  the  interfering  signals  are  absent. 
The  noiseless  received  signals  {*,(«)}  in  (2)  differ  only  by  a 
phase  factor  coork(0),  so  from  Prony's  method  [11],  we  can  find 
that  the  noiseless  signals  {*,(«)}  obey  a  linear  difference 
equation  [5],  By  dividing  the  array  into  L  overlapping  subarrays 
of  size  m,  where  L  =  M-m  + 1  and  m>q  + 1,  i.e.  the  /th 
forward  subarray  comprises  sensors  {/,/+!,•••, /+m-l],  the 
signal  x, +„,_,(«)  can  be  exactly  predicted  as  follows  [4] 


degrades  to  the  performance  of  the  ordinary  cyclic  methods. 

3.2  Linear  Prediction  Based  DOA  Estimation 

In  the  absence  of  interfering  signals,  from  (1),  (6)  and  (9),  we 
obtain  the  cyclic  correlation  r“MyM(r)  between  y,+m. ,(,J)  in  the 
/th  forward  subarray  and  yM(n)  as 


Wi  (,n)  =  xj/n)a  (4) 

where  xfl(n)  =  [xl(n),xM(n),---,xl+,„_1(n)]T ,  a  =  [a^,am -2,---,a1F, 
and  {a,  }  are  the  LP  coefficients.  Similarly  by  partitioning  the  frill 
array  into  L  subarrays  with  m  sensors  in  the  backward  direction, 
we  obtain  the  backward  LP  equation  for  the  /th  backward 
subarray  as 

xl-i+\(n)  -  xh{n)a  (5) 

where  xbl(n)  =  [xM.M(n),xM.l(n),---,xL.M(n)]H,  (•)*  and  (■)" 
denote  the  complex  conjugate  and  the  Hermitian  transpose.  Then 
we  get  the  following  FLP  and  BLP  models  for  the  received  data 

y„ur.l(n)  =  y}An)a+£fAn)  (6) 

y’L-m(n)  =  yh(n)a + £bj(n)  (7) 

where  yfJ(n)  =  [yi(n),yM(n),---,yl+m_2(n)]T ,  ybl(n)  =  [yM_,+1(/i), 
yM./(n),---,yi.(+,(;i)]H,  £fJ(n)  and  £bJ(n)  are  the  forward  and 
backward  prediction  errors  given  by  £f  J(n)  =  wl+,„-i{n)-wTIJ(n)a 
and  £bJ  (n )  =  wj_,+1  (n )  -  wf,  (n)a ,  H’/J(n)  =  [w,(n),w, *,(«)■•••, 
W/+,„-2(n)F,  and  wbJ(n)  =  [wMH+l(n),wMH(n),--,wL.l+2(n)]H . 

The  accumulation  of  the  additive  noise  in  y,+m_,(n),  y[_,+l(n), 
yfJ(n)  and  yhJ{n)  will  cause  the  ordinary  least  squares  (LS)  or 
minimum-norm  estimate  from  (6)  and  (7)  to  become  biased  and 
inconsistent  [13],  and  this  estimate  will  make  the  DOA 
estimation  unreliable.  In  the  paper,  we  thus  exploit  the  inherent 
cyclostationarity  of  most  communication  signals  to  suppress  the 
interfering  signals  and  noise. 

3.  FBLP-BASED  CYCLIC  DOA  ESTIMATION 
3.1  Cyclic  Correlation  of  Noisy  Data 
First  the  noiseless  signal  x,(n)  can  be  rewritten  compactly  as 

xi(n)  =  bT{6)s(ri)  =  sT(n)bl(6)  (8) 

where  =  .  Then  from 

the  definition  of  the  cyclic  correlation  [7],  and  under  the  model 
assumptions,  we  obtain  the  cyclic  correlation  function  i£  (r) 
between  the  noisy  signals  y,(/z)  and  yk(n)  as 

r“„  (T)  =  {y,(n)y'k(n  +  r  )e-**a«)  =  bj  ( 8)R?(T)bl(6 )  (9) 

where  {z(n))  =  limv_,_(l//V)  ijHz(n)  denotes  the  time  average  of 
z.(n),  t  is  the  lag  parameter,  and  Rf(r)  is  the  cyclic  covariance 
matrix  of  the  source  signals  given  by 

Rf(  T)  =  (s(n)sH  (n  +  T)e-i1’anS) 

=  (fi  st(n)fi"  s](n  +  z)e-ilr-a«)  =  rsa(T)fif$H  (10) 

where  p  is  the  vector  of  multipath  coefficients  given  by 
P  =  iPi,-,Pi>Pt*u—,PpY  with  pq+l---pp=  0,  and  r/(r)  is 
the  cyclic  autocorrelation  function  of  the  signal  s,(«)  given  by 

r“(f)  =  +  T)e-i1'ta’''). 

Clearly  the  influence  of  the  arbitrary  (not  necessarily 
stationary  and/or  spatially  white)  noise  and  interference  vanish  if 
the  cycle  frequency  a  is  appropriately  selected,  so  the  signal 
detection  capability  can  be  improved.  However,  because  of  the 
coherency  of  the  q  signals  from  the  desired  source,  we  can 
easily  find  that  the  cyclic  matrix  Rf(t)  is  singular,  and  it 


=  {yJAn)ylf(n+'r)e-J2*c'")a  =y}j(T)a  (11) 

where  <pfJ(r)  =  [»;“„, (t),i^kv„(t),-,i^,uT„( t)F  .  Equivalently, 
we  can  obtain  the  cyclic  correlation  r“VI  w (j)  between  y,(n) 
and  yL-M(n)  in  the  /th  backward  subarray  as 

(t)  =  (» (")yl-,+ 1(«  +  t  )e-n*°») 

=  (y,(nMAn  +  T)e-J2’"x")a  =  <plA?)a  (12) 

where  <pbJ(t)  =  [r°yu_M  (r), r“VMj (T). •  •  • ,  (T)]C 

As  shown  in  (9)  and  ( 10),  even  in  the  presence  of  interfering 
signals,  the  influence  of  the  interfering  signals  and  noise  are 
eliminated  by  exploiting  the  cyclostationarity,  we  can  find  that 
the  prediction  relations  (11)  and  (12)  in  the  cyclic  domain  are 
valid  when  the  interfering  signals  are  present.  Now  we  consider 
the  DOA  estimation  of  the  desired  coherent  cyclostationary 
signals  by  utilizing  the  LP  technique.  By  letting  /  =  lto  L,  from 
(11)  and  (12),  we  can  obtain  the  following  FBLP  equation 

z(r)  =  <P(r)a  (13) 

where  z(r)  =  [zj(r  ),zf(r)f,  <P(r)  =  [3>/(t),<P1T(t)]7',  zf(  r)  = 

Z>(T)  =  [^vM.„,(T),r“VM.,„(T),---, 

r“,,(r)F,  <Z>,  (T)^[tp  fjT),q>  fjT),- ■  ■,<pij{r)Y ,  and  <P„(r)  = 

[<pbAT),<pb.AT),---,<pbAT)]r. 

To  combat  the  rank  deficiency  resulting  from  signal 
coherency,  we  have  the  following  proposition. 

Proposition:  If  the  array  is  partitioned  properly  to  ensure 
2L>q,  the  rank  of  the  cyclic  matrix  <P(r)  in  (13)  equals  the 
number  of  the  desired  coherent  signals. 

Proof:  By  defining  A,  (9)  and  A2(6)  as  the  submatrices  of  the 
array  steering  matrix  A(0 )  consisting  of  the  first  m- 1  and  L 
rows  respectively,  after  some  manipulations,  we  can  obtain  [14] 


<b(T)  =  r?(T)pl 


M9)B 

pi  A2  (6)B'D~,M-,)/plf 


Al(6) 


=  r?(r)p'MCBA{(6)  (14) 

where  pi=br(9)P,  B  =  diag(/3,,/3, ,••■,/),),  D  =  diag(e^°",9), 
C  =  [Af(P),(A,(0)r)7']r,  r  =  diag(yi,---,y9, 
and  yi=(p\Pilp"Mp:)e~im'u~')tm  for  i  =  12, ■■■,(] 

while  y,=0for  i  =  q+l,  -,p. 

From  the  model  assumptions,  we  have  rank(B)  =  rank(D  =  q 
and  rank(A,(0))=  min(;n-l,p)  and  rank(A,(0))  =  min(L,p). 
Consequently,  from  the  fact  that  m>q  + 1  and  q<p,  we  can 
obtain  that  rank(A|(60)  >  (/.  Additionally,  the  rank  of  the  matrix 
C  is  given  by  rank(C)  =  min(2L,p),  so  rank(C)  =  q  iff  2 L>q. 
Thus  if  2 L>  p,  the  rank  of  the  cyclic  matrix  <p(r)  is  equal  to 
the  number  of  the  desired  signals  q  regardless  of  the  coherence 
of  these  signals.  Here  the  fact  pM±  0  and  the  assumption 
r“(r)  *  0  are  used  implicitly.  ■ 

However,  the  matrix  <P(t)  is  usually  rank-deficient  because 
q<2L  and  q<m- 1,  so  we  use  the  truncated  singular  value 
decomposition  (SVD)  to  obtain  a  numerically  reliable  estimation, 
where  the  SVD  of  the  matrix  <P(r)  is  given  by 

<P(t)  =  1/AV"  (15) 

where  U  =  [ul,u2,  --,u2L],  Y  =  [»’i,v2,  --,v„_1],  and  A  =  diag(A,, 
A2--">Amin,2i.,m-i)).  Then  from  (13),  the  minimum-norm  estimate 
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of  the  LP  parameter  a  is  obtained  [5],  [14] 

a  =  f  A7'v,k,"z(t)  (16) 

i=i 

Finally  by  finding  the  phase  of  the  q  zeros  of  the  polynomial 

£>(-)=  1  —  z?,c_l -a2z~: - closest  to  the  unit  circle  in 

the  z  -plane,  or  by  searching  for  the  q  highest  peaks  of  the 
spectrum  l/\D(eim''llmi«e)\- ,  the  directions  of  the  desired 
coherent  signals  can  be  estimated. 

3.3  Cyclic  DOA  Estimation  Algorithm 

As  the  cyclic  correlation  function  is  dependent  on  the  lag 
parameter  r  [7],  if  the  cyclic  correlation  of  one  source  is  zero  or 
insignificant  for  a  given  r,  then  this  signal  will  not  be  resolved. 
The  choice  of  the  optimal  lag  parameter  is  important  in  cyclic 
methods  [8],  but  it  is  rarely  available.  For  alleviating  the 
difficulty  in  choosing  the  optimal  lag  and  to  exploit  the  cyclic 
statistics  effectively,  we  use  multiple  lags  to  obtain  a  robust 
estimate  of  the  LP  parameter  a.  By  concatenating  (13)  for 
T  =  -0--,-l,O,l,-,g,  we  can  obtain  a  modified  cyclic  vector- 
matrix  form  as 

z  =  ©a  (17) 

where  z  =  [z7'(-G),-",z7'(-l),z7'(0),zr(l),---,z7'(fi)]r.  and  ©  = 
[<Pr(-0,"-,<Pr(-l),<Pr(O),<Pr(l),"',<I>r((>)F.  Here  we  can 
choose  Q  large  enough  so  that  r“v(r)  is  non-zero  and 
significantly  varying  for  |r|>  Q  [14],  Then  we  can  estimate  the 
directions  of  the  desired  coherent  signals  with  the  cycle 
frequency  a  from  (17). 

In  summary,  the  proposed  FBLP-based  DOA  estimation 
algorithm  from  the  finite  array  data  {yi(«),y:(n),  -,yAt(«));)-o  is 
as  follows. 

a)  Set  the  subarray  size  m  to  satisfy  m>q  + 1  and  2L>q, 
where  L  =  M -m+1. 

b)  Calculate  the  estimates  of  the  cyclic  correlations  r“yM(r) 
and  r“,,(  r)  for  T  =  -g,-,-l,0,l,-,£>  as 


Proposition,  it  follows  that  the  subarray  size  in  (<'.<?.  the  order  of 
the  prediction  model  plus  one)  must  be  chosen  to  satisfy  the 
inequality  q+\  <  in  <  M-q/2  +  l  [14],  The  choice  of  optimal 
value  of  m  is  crucial  to  achieve  the  best  performance  of 
direction  estimation  [5],  but  it  generally  depends  on  the  number 
of  desired  coherent  signals,  the  SNR  and  the  angle  separation  of 
incident  signals. 

Now  we  investigate  the  choice  of  the  subarray  size  to 
minimize  the  variance  of  peak  position  error  of  the  spectrum 
.  The  derivation  of  the  error  variance  of  spectral 
peak  position  for  direction  estimation  is  tedious,  so  here  we  only 
give  the  result  for  the  sufficiently  high  SNR.  As  the  interfering 
signals  are  suppressed  in  the  proposed  cyclic  approach,  for 
notational  simplicity,  we  assume  that  the  interfering  signals  are 
absent  and  that  the  noise  is  temporally  and  spatially  uncorrelated 
white  complex  Gaussian  noise,  i.e.  p  =  q ,  P,  *  0  for  k  =  1,2, 

•  ••./>,  and  E[\vi(n)\\'l(n)}  =  a'-8u  and  E{wi(n)wl(n)}  =  0,  where 
£■{•}  and  <5,j  denote  the  expectation  and  Kronecker  delta.  As  the 
true  parameters  {«,}  with  order  nt-l  can  be  determined  exactly 
by  using  the  method  of  undetermined  coefficients  [11],  by 
adopting  the  linear  approximation  as  used  in  [12],  we  can  obtain 
the  variance  for  the  peak  position  error  in  terms  of  noise  variance, 
signal  power  and  subarray  size  as  follows  [14] 


vartra,) 


cr: 

2(m-l)L2\pk\2rs  ’ 
(3iii(m  -  2)  —  2Z7  +  2)o- 

3»r(m-lFL|/?J’ ;; 


for  m  <  M/2  + 1 
for  m  >  M/2  + 1 


(21) 

where  cok  denotes  the  “spatial  frequency"  co0rt(8)  for 
convenience,  and  rs  =  £{.?,( n)s[(n))-  Therefore  we  can  find  that 
var(w,)  increases  with  .subarray  size  in  for  m  >  M/2  +  l  while 
vartro,)  has  the  minimum  m  at  about  M/ 3  +  1  for  m>  M/2+l. 
It  is  straightforward  to  show  that  the  minimum  variance  of  ook 
(and  hence  6k )  can  be  obtained  when  m  =  M/2  + 1 . 


r“ v,(r)  =  (1/A0*S  Xyi(n)y't{n  +  T)c->1*a" ,  for  r  >  0  (18) 
,1=0 

r“  (r)  =  (l/N)N2  y,(n)yl(n  +  T)e-i2’°" ,  for  r<0  (19) 

n=-r 

where  ;  =  1,2,---,M  and  k  =  M  for  r“v„( t),  while  k  =  1,2 
•,M  and  i  =  1  for  r“yi(r). 

c)  Form  the  estimated  cyclic  vector  £  and  matrix  ©  as  (17) 
by  using  (18),  (19)  and  (13). 

d)  Perform  the  SVD  on  the  estimated  matrix  ©  as  (15),  where 
L  is  replaced  by  L  =  (2Q  +  l)L. 

e)  Calculate  the  estimate  of  the  LP  parameter  a  as 

a  =  £A71v,«/'z  (20) 

j-l 

f)  Estimate  the  DOA  of  the  signals  from  the  q  highest  peak 
locations  of  the  spectrum  given  by  l/|ZXe'“0,rf/r’si,,e)|:  ■ 

Remark:  Calculating  the  cyclic  correlations  for  multiple  lags 
takes  approximately  52 NXNM  flops,  where  a  flop  is  defined  as  a 
floating-point  addition  or  multiplication  operation  as  adopted  by 
MATLAB.  The  number  of  flops  needed  by  the  SVD  of  matrix 
©  is  of  the  order  0({2LN,  )2(m-\)),  while  the  computation  of  a 
requires  8{in-l)(q2 +2LNrq+2LNr)+q  flops.  Thus  a  rough 
estimate  of  the  number  of  MATLAB  flops  required  by  the 
dominant  steps  in  the  implementation  of  proposed  approach  is 
52 NkNM  when  N»  M,  where  the  computations  needed  by  the 
remaining  steps  are  negligible. 

3.4  Optimal  Subarray  Size 

For  estimating  the  directions  of  the  q  coherent  signals,  from 


4.  NUMERICAL  EXAMPLES 

The  effectiveness  of  the  proposed  cyclic  FBLP-based  direction 
estimation  method  is  illustrated  through  numerical  examples,  in 
which  the  desired  coherent  binary  phase-shift  keying  (BPSK) 
signals  can  be  distinguished  from  the  interfering  BPSK  signals 
with  different  cycle  frequencies.  In  the  simulations,  the  sensor 
separation  of  the  ULA  with  M  =  8  is  half-wavelength,  where 
fc=  8  MHz,  c  =  3x1  O'1  m/s,  the  sensor  outputs  are  collected  at 
the  rate  f=  8  MHz.  and  the  lag  parameter  Q  is  chosen  as  Q  = 
10.  The  BPSK  signals  have  a  raised-cosine  pulse  shape  with 
50%  excess  bandwidth.  The  additive  noise  is  temporally  and 
spatially  uncorrelated  white  complex  Gaussian  noise  with  zero- 
mean  and  variance  cr2 .  The  SNR  is  defined  as  the  ratio  of  the 
power  of  the  source  signals  to  that  of  the  noise  at  each  sensor. 
The  results  shown  below  are  all  based  on  100  independent  trials. 

Example  1:  Performance  versus  SNR 

The  direct-path  signal  from  the  BPSK  1  source  impinges  on  the 
array  from  angle  P,=-10"  with  1.6  MHz  baud  rate  (a  =  0.2 
normalized  to  the  sampling  rate  [8]),  while  one  coherent  arrival 
comes  from  0,  =  4°  with  multipath  coefficient  /),  =  1.  There  is 
one  interfering  BPSK  2  signal  that  arrives  from  0,  =  0°  with 
2.0  MHz  baud  rate  (a  =  0.25).  The  number  of  snapshots  and  the 
subarray  size  are  N  =  512  and  in  =  5 .  The  SNR  of  the  desired  is 
varied,  while  that  of  the  interference  is  fixed  at  10  dB.  The  root 
mean-squared-errors  (RMSEs)  of  the  estimates  and  Cramer-Rao 
lower  bound  (CRLB)  [3]  versus  SNR  are  shown  in  Fig.  1. 
Because  SS-based  MUSIC  [4]  and  smoothed  LP  method  [6]  do 
not  exploit  the  temporal  properties  of  the  incoming  signals,  they 
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are  unable  to  distinguish  the  desired  signals  from  the  interference 
correctly  even  when  the  dimension  of  signal  subspace  is  assumed 
to  be  the  number  of  coherent  signals.  Although  the  RMSE  of 
estimate  B2  obtained  by  MODE  [2]  decreases  as  the  SNR 
increases,  the  performance  of  MODE  degrades  severely  at  low 
SNR,  and  the  estimate  6,  has  a  rather  large  RMSE.  Except  at 
very  low  SNR,  the  proposed  approach  performs  better,  and  it  is 
more  accurate  than  SS-based  cyclic  MUSIC  [10]  with  its  RMSE 
very  close  to  the  CRLB  at  higher  SNR. 


Estimation  Performance  versus  SNR  (6,  =  -10°) 


Fig.  1  RMSEs  of  the  estimates  versus  SNR  (dotted:  SS- 
based  MUSIC;  dashed:  smoothed  LP;  dash-dot:  SS-based 
cyclic  MUSIC;  solid  with  “o”:  MODE;  solid:  the  proposed 
approach;  and  dash-dots:  CRLB). 


Subarray  Size  m  Subarray  Size  m 

Fig.  2  ERMSEs  of  the  estimates  versus  subarray  size 
(dotted:  SS-based  MUSIC;  dashed:  smoothed  LP;  dash-dot: 
SS-based  cyclic  MUSIC;  solid:  the  proposed  approach;  and 
dash-dots:  empirical  CRLB ). 

Example  2:  Performance  versus  Subarray  Size 
The  simulation  parameters  are  the  same  as  that  in  Example  1, 
except  that  the  subarray  size  m  is  varied  from  3  to  8.  For 
measuring  the  overall  estimation  performance,  we  define  an 
“empirical  RMSE  (ERMSE)”  of  the  estimated  directions  as 


ERMSE  =  ^( l/qK)i  |(0 -0,Y  (22) 

where  K  is  the  number  of  trials,  and  Q\k)  is  the  estimate 
obtained  in  the  k  th  trial.  Under  the  SNRs  of  the  desired  signal  of 
-2.5  dB,  0  dB,  5  dB  and  17.5  dB,  the  ERMSEs  of  the  estimates 


against  subarray  size  are  shown  in  Fig.  2,  where  the  “empirical 
CRLB”  is  calculated  by  averaging  the  corresponding  CRLBs 
over  the  number  of  coherent  signals.  We  find  that  the  best 
estimation  can  usually  be  attained  when  m  is  about  M/3+1  for 
medium  and  high  SNR,  while  a  reasonable  estimation  can  be 
obtained  with  a  larger  value  of  m  for  low  SNR. 

5.  CONCLUSIONS 

For  estimating  the  directions  of  coherent  cyclostationary  signals 
impinging  on  a  ULA,  we  proposed  a  new  cyclic  FBLP  method. 
In  order  to  improve  the  estimation  performance,  multiple  lag 
parameters  are  used  to  exploit  the  cyclic  statistics  sufficiently  and 
effectively.  The  optimal  subarray  size  that  minimizes  the  peak 
position  variance  was  derived  using  linear  approximation  for 
sufficiently  high  SNR.  As  a  result,  the  proposed  method  has  two 
advantages:  the  computational  load  is  relatively  reduced  and  the 
robustness  of  estimation  is  significantly  improved.  The 
performance  of  the  proposed  method  was  verified  through 
numerical  examples. 
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ABSTRACT 

We  present  a  novel  algorithm  for  the  estimation  of  the  direction 
of  arrival  and  angular  distribution  parameters  of  sources  that,  as  a 
result  from  the  scattering  effects,  cannot  be  considered  as  punctual. 

The  algorithm  is  iterative  and  is  based  on  the  maximization 
of  the  likelihood  function  associated  to  the  received  snapshots  at 
the  antenna  array  (ML).  It  is  proposed  a  computationally  efficient 
method  for  estimating  the  localization  and  angular  distribution  pa¬ 
rameters  of  more  than  one  source  transmitting  at  the  same  fre¬ 
quency  in  a  noisy  environment.  This  algorithm  solves  the  problem 
of  the  joint  maximization  in  the  case  of  two  sources  by  formulating 
two  new  problems  of  single-source  ML. 

Key  Words-  Array  signal  processing,  DOA  estimation,  dis¬ 
tributed  sources,  statistical  parameter  estimation. 

1.  INTRODUCTION 

Classically,  the  methods  for  the  estimation  of  the  direction  of  ar¬ 
rival  (DOA)  have  considered  punctual  sources  and  spatio-temporal 
thermal  white  noise.  This  problem  can  be  assumed  equivalent  to 
those  of  frequency  detection  based  on  temporal  diversity.  How¬ 
ever,  whereas  in  spectral  analysis,  two  different  frequencies  are  al¬ 
ways  totally  uncorrelated,  in  spatial  diversity  two  signals  imping¬ 
ing  from  different  angles  can  be  partially  correlated. 

The  multipath  propagation  implies  an  increase  in  the  temporal 
correlation  between  signals  from  different  directions,  making  the 
performance  of  the  classical  spectral  analysis  techniques  worse. 
Besides  the  spatial  smoothing  techniques,  the  most  effective  so¬ 
lution  to  this  problem  is  represented  by  the  spreading  systems, 
such  as  radar  "pulse  compression  techniques”  and  spread  spec¬ 
trum  communications  (DSSS),  which  are  based  on  an  increase  of 
the  signal  bandwidth.  By  means  of  these  techniques,  the  classical 
non-parametric  spectral  analysis  algorithms  can  be  applied,  con¬ 
verting  the  DOA  detection  in  a  multipath  environment  in  multiple 
uncorrelated  punctual  sources  DOA  detection  problems,  even  with 
minimum  time  shift  differences  between  echoes. 

However,  the  presence  of  scatterers  near  the  transmitter  with 
no  relative  delays  between  different  DOAs,  makes  the  performance 
of  the  spreading  systems  worse.  The  source  must  be  considered 
as  distributed,  and  therefore  the  classical  methods  may  fail  be¬ 
cause  of  the  high  spatio-temporal  correlation.  As  in  the  case  of 
specular  multipath,  this  problem  cannot  be  solved  by  manipulating 
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TIC98-0703,  TIC99-0849,  TIC2000-1025,  FTT-070000-2000-649;  and  the 
Catalan  Government  (CIRIT)  2001  FI  007 14,  2000SGR  00083. 


the  transmitted  signal.  The  distributed  source  signs  with  a  unique 
temporal  waveform  and  a  spatial  signature.  Although  the  scatter¬ 
ing  may  change  over  the  time,  it  can  be  considered  time-invariant 
within  the  frame  duration  in  most  of  the  cases.  That  means  that  in 
large  periods  of  time,  the  correlation  matrix  of  the  received  snap¬ 
shots,  considering  a  free-noise  environment,  is  full-rank  [1]  [2], 
whereas  in  short  periods,  due  to  the  local  analysis  of  the  scenario, 
each  source  is  contributing  as  a  rank  one  covariance  matrix,  so  it  is 
completely  coherent  during  a  frame.  Our  interest  is  to  characterize 
this  quasi-static  behaviour  of  a  source,  and  not  the  inter-frame  or 
inter-scan  changes  [1]  [2]  [3], 

The  proposed  technique  consists  in  maximizing  the  likelihood 
function  associated  to  the  parameters  of  the  angular  distribution  of 
the  sources.  This  represents  a  multi-variable  maximization  prob¬ 
lem  with  a  very  high  computational  cost.  Solutions  based  on  EM 
(Estimate  &  Maximize)  [4]  [5]  or  AP  (Alternating  Projection)  [6] 
and  RCAP  (Reduced  Complexity  Array  Processing)  [7]  [8]  con¬ 
vert  this  problem  into  multiple  one-dimensional  problems. 

This  paper  presents  an  algorithm  for  the  estimation  of  the  spa¬ 
tial  signature  of  multiple  distributed  sources  with  rank  one  con¬ 
tributions.  that  is,  estimation  within  the  time  duration  of  a  frame. 
In  the  case  of  a  more  prolongated  observation  period,  the  classical 
spectral  analysis  methods  can  be  applied.  In  general  terms,  this 
work  presents  the  generalization  of  the  AP  and  RCAP  techniques 
to  the  case  of  distributed  sources. 

2.  SIGNAL  MODEL 

In  the  case  of  a  single  source  scenario  and  an  array  of  antennas, 
a  known  angle  distribution  can  be  assumed  fo{0,  n),  whose  mean 
is  0o  and  v  is  the  temporal  index  of  the  received  snapshot.  The 
snapshot  model  is  as  follows  [9]: 

<•*/  2 

x„  =  a(r?)  /  fo(6,n)s(f))d0  +  w„  =  a(r?)bn  +  wn 

tt/2 

b„  =  r1  fo(0,n)s(0)d0  (1) 

J-1 r/2 

where  a(n)  is  the  complex  envelope  of  the  transmitted  signal,  s (0) 
is  the  steering  vector  for  a  punctual  source  in  the  elevation  angle 
0  and  w n  is  the  noise  contribution  at  the  front-end.  In  this  model 
the  complex  envelope  is  the  same  for  all  the  angles  of  arrival  of  the 
source,  so  it  is  totally  correlated.  The  goal  is  to  estimate  the  spatial 
signature  b„  of  the  source,  which  is  already  defined  in  (1). 

In  the  case  of  long  or  inter-frame  observation  periods,  the  spa¬ 
tial  signature  can  change,  and  so  its  temporal  correlation  can  be 
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exploited  by  the  system  to  update  the  estimate  of  the  source  move¬ 
ment  or  position,  in  the  case  of  low-mobility  environments.  The 
inter-frame  spatial  signature  tracking  system  can  be  based  on  a 
Kalman  filter,  although  this  is  outside  the  scope  of  this  work  [8], 

Our  interest  is  centered  in  the  spatial  signature  estimate  for 
the  case  of  non-varying  sources.  This  is  the  general  situation  in 
a  space-time  diversity  scheme  at  the  receiver  and/or  the  transmit¬ 
ter  side  of  the  communication  system,  in  which  an  estimate  of  b 
is  required  within  a  frame  duration.  Although  this  signature  can 
change  in  time  periods  longer  than  the  frame  duration,  it  can  be 
assumed  constant  within  one  frame  and  the  N  received  snapshots, 
so  the  signal  model  is  as  follows: 

x„  =  a(r?.)b  +  wn  (2) 

where  now  it  is  remarked  that  the  spatial  signature  b  does  not  de¬ 
pend  on  the  temporal  index  n.  This  is  the  same  situation  as  the  one 
described  in  [2]  for  the  case  of  Coherently  Distributed  Sources.  In 
that  case,  the  angular  distribution  fo(ff),  called  deterministic  an¬ 
gular  signal  density,  is  assumed  to  be  known. 

In  order  to  estimate  the  spatial  signature,  the  preamble  and 
reference  symbols  in  the  frame  may  be  used.  In  the  problem  we 
assume,  no  reference  signal  is  used.  In  this  case,  and  taking  equa¬ 
tion  (2)  as  a  basis,  the  estimated  covariance  matrix  for  a  scenario 
with  NS  independent  sources  is  as  follows: 


1  JV— 1  NS 

R  =  XnX-n  =  ^Qsbsbf  +(72I  (3) 

n=0  s=l 

cr2I  =  .E  jw„wf  j  as  =  E  {|as(n)|2} 

It  will  be  assumed  in  the  theoretical  description  of  the  algo¬ 
rithm,  that  there  is  no  error  in  the  estimation  of  the  covariance  ma¬ 
trix.  In  the  simulations,  the  impact  of  the  variation  of  the  number 
of  snapshots  will  be  shown. 

Our  goal  is  the  estimation  of  the  parameters  of  the  distribution 
/o(0).  It  is  practical  to  consider  a  discrete  model  of  the  spatial 
signature  instead  of  its  integral  model  (1).  We  consider  the  con¬ 
tribution  of  the  same  signal  impinging  from  M  different  angles, 
where  M  is  much  larger  than  the  total  number  of  sensors  Q  [9]: 


b=  y/Q- 


Sfo 

IS  foil 


b"b 

Q 


=  l 


(4) 


where  S  =  [  s(0(O))  s(#(l))  s(6(M  —  1))  ]  and  we 

have  normalized  the  spatial  signature  b  so  that  the  mean  signal 
power  measured  at  the  sensors  of  the  array  is  equal  to  the  signal 
power  as.  0(k)  represents  the  discretized  angular  axis. 

At  this  point,  it  is  only  necessary  to  parametrize  the  distribu¬ 
tion.  The  experimental  results  and  measurements  at  2  GHz  in¬ 
dicate  that  the  best-fitting  model  is  the  exponential  one,  with  its 
mean  value  situated  at  the  real  position  of  the  source.  The  dis¬ 
cretization  of  the  angular  distribution  is  expressed  in  the  vector 
fo  =  [  /o(0)  /o(l)  fo(M-l)  ]T,  where /0(m)  = 

exp(— co  \0(rn)  —  #o|)  0  <  m  <  M  —  1.  In  the  simulations,  not 
only  a  Laplacian,  but  also  Gaussian  and  rectangular  profiles  have 
been  proved.  In  all  the  cases,  the  performance  of  the  algorithms 
described  throughout  the  paper  is  good.  In  the  simulations  section, 
the  results  for  the  case  of  a  Laplacian  distribution  are  shown. 

In  section  3  our  goal  is  to  estimate  the  spread  parameter  co 
and  the  mean  angle  of  arrival  0o  based  on  the  estimate  of  the  co- 
variance  matrix  in  the  case  of  a  single-source  scenario.  In  the  next 
sections,  the  generalization  to  the  case  of  a  two  sources  environ¬ 
ment  is  discussed  through  algorithms  based  on  AP  and  RCAP. 


3.  SINGLE  SOURCE  ESTIMATION 


It  is  well  known  that  the  maximum  likelihood  (ML)  estimator  of 
the  spatial  signature  b  in  an  AWGN  environment  is  as  follows: 


r  bHRb  c  , 

b  =  are  max  — — —  =?  b  =  he, 
b*b 


Re  : 


||e||  =  1  (5) 


so,  it  is  an  eigenvector  problem,  where  the  maximum  eigenvalue 
must  be  chosen.  Due  to  errors  in  the  estimation  of  the  the  covari¬ 
ance  matrix,  it  is  possible  that  the  eigenvector  e  does  not  exactly 
fit  the  parametric  definition  of  the  spatial  signature  b  as  expressed 
in  (4).  A  MSE  (Minimum  Square  Error)  criterion  is  proposed  so 
as  to  fit  the  distribution  parameters  to  the  eigenvector  e: 

po,co,/30]  =  argmin  ||v/Amaxe  -  /3ob(0o,co)||  (6) 

where  |/30|  is  an  estimate  of  the  RMS  value  of  the  source  and  b 
is  defined  as  shown  in  (4).  The  validity  of  the  expression  is  based 
on  the  idea  that  the  maximum  eigenvalue  is  an  approximated  mea¬ 
surement  of  the  source  power  in  a  single  source  and  typical  SNR 
environment.  In  the  case  of  extremely  low  SNR  conditions,  a  noise 
calibration  should  be  carried  out. 

The  mean  value  of  the  angular  distribution  can  be  easily  es¬ 
timated  through  the  following  expression,  where  the  spatial  re¬ 
sponse  of  the  eigenvector  e  is  calculated: 

|  „  ~  |2 

#o  =  argmax  s  (#o)e  (7) 

We  admit  that  this  estimator  has  a  bias.  However,  for  sources 
situated  within  the  angle  view  [-40° ,40°]  and  typical  spreadings 
in  mobile  communications,  the  deviation  is  minimum.  The  great 
advantage  is  that,  making  use  of  this  estimator,  a  two  dimensional 
search  (mean  angle  and  spreading  parameter)  is  avoided  in  (5). 

The  estimate  of  the  parameter  |/30 1  can  be  expressed  in  func¬ 
tion  of  the  estimate  of  the  spatial  signature,  where  now,  only  an 
unidimensional  search  on  the  spreading  parameter  co  is  necessary 
based  on  the  MSE  expression  (6). 

fo  =  fo(?o,£o)  =  (8) 

Sf0  bHb 

At  this  point,  it  is  important  to  highlight  that  the  traditional 
methods  suffer  important  degradations  when  trying  to  estimate  pa¬ 
rameters  of  distributed  sources  using  the  classical  punctual  source 
model.  It  is  the  same  degradation  as  that  produced  by  a  system 
with  an  uncalibrated  array. 


4.  TWO  SOURCES  ESTIMATION 

Now  the  previous  method  is  extended  to  the  case  of  a  scenario 
with  two  distributed  sources.  The  extrapolation  of  the  algorithm 
to  the  case  of  more  sources  is  direct,  although  the  computational 
cost  grows  importantly.  In  most  of  the  real  communication  sys¬ 
tems,  it  is  not  exaggerated  to  assume  that  only  two  independent 
sources  with  no  negligible  power  level  are  radiating,  where  one  of 
them  can  be  considered  as  the  desired  signal  and  the  other  as  the 
interference. 

In  the  following  presentation,  the  spatial  signatures  bi  and 
b2  will  be  used.  The  algorithm  is  iterative,  and  in  each  step  the 
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parameters  of  one  of  the  sources  are  estimated,  based  on  the  pre¬ 
vious  estimates  of  the  other  source  parameters.  It  can  be  shown 
in  the  simulations  that  the  algorithm  converges  and  the  quality  of 
the  estimates  improves  as  the  number  of  iterations  grows.  In  this 
paper,  one  of  the  steps  is  presented,  where  a  previous  estimate  of 
the  second  source  is  assumed: 

£2  =  v/Q-rr^n  ?2  =  fo(<?2,c2)  %  =  (9) 

Sf2  b‘2  b2 

Based  on  this  estimate,  it  is  possible  to  estimate  the  covariance 
matrix  Ri  of  the  snapshots  without  the  contribution  of  the  second 
source.  The  algorithm  presented  in  section  3  is  now  applied  to  Ri 
so  as  to  estimate  the  parameters  of  the  first  source.  The  mechanism 
must  be  applied  iteratively  to  obtain  admissible  estimates. 

Rx  =R- |32|2b2b.f  (10) 

It  is  interesting  to  comment  a  conflictive  point  in  this  algo¬ 
rithm,  which  deals  with  the  second  source  RMS  power  estimate 
32  (9),  similar  to  the  one  deduced  by  Li  and  Stoica  [10].  Equation 
(10)  does  not  guarantee  that  Ri  is  positive  defined.  The  maximum 
source  power  estimate  that  guarantees  it  is  ^b^  R_  1  b2  j  while 

l~  I2 

the  MSE  based  \d2\  estimate  (6)  (9)  may  be  greater.  However, 
in  the  simulations  it  will  be  proved  that  the  algorithm  works  well, 
although  some  of  the  eigenvalues  may  be  negative  in  the  first  steps 
of  the  iterative  mechanism. 

The  algorithm  presented  up  to  this  point  and  based  in  (10)  is 
called  "substruction  method".  Now,  we  present  another  method 
based  on  a  blocking  matrix  as  the  AP  [6]  or  RCAP  [8]  algorithms, 
whose  name  is  ” blocking  method".  In  this  case  we  consider  the 
second  source  as  Gaussian  noise  with  a  known  covariance  matrix. 
Our  goal  is  to  estimate  the  parameters  of  the  first  source  based  on 
the  maximization  of  the  log-likelihood  associated  to  the  received 
snapshots.  Taking  into  account  all  the  snapshots,  and  assuming 
that  the  symbols  emitted  from  the  second  source  are  independent 
and  the  noise  is  white,  the  estimated  spatial  signature  for  the  first 
source  is  calculated  as  follows  (see  Appendix): 

r  bfPRPbi  r  , 

bi  =  arg  max  — ^ - - -  =>  bx  =  ke,  (11) 

b^Pbx 

RPe  =  A„m*e,  ||e||  =  1 


where  P,  called  blocking  matrix,  is  defined  as  follows: 


P 

(/>2 


I  -  02 b2bf 

l  +  b"b  2SNR2  a2 


(12) 


Equation  (1 1)  is  a  modified  eigenvector  problem  very  similar 
to  the  one  described  in  the  case  of  AP  or  RCAP  algorithms.  As  ex¬ 
plained  in  section  3,  when  the  eigenvector  e  is  calculated,  the  spa¬ 
tial  signature  bx  and  the  distribution  parameters  of  the  first  source 
(mean  DOA  and  spreading  parameter)  must  be  fitted  as  expressed 
in  the  next  equation,  based  on  a  MSE  criterion: 


/?xbx(0],ci) 

(13) 


The  constant  multiplying  the  eigenvector  is  found  by  calculat¬ 
ing  the  maximum  eigenvalue  of  the  matrix  RP  which  is  the  one 
that  solves  equation  (11):  AIlm*  =  |,flx|2  ||bi  ||2-|/?i  |2  02  |bfb2|2 
assuming  a  sufficiently  high  SNR  environment. 

The  blocking  matrix  is  defined  by  the  parameter  02,  which 
depends  on  the  level  of  the  source  to  be  blocked.  In  the  case  of  AP 
or  RCAP,  this  blocking  is  independent  of  the  source  level.  In  this 
sense,  the  performance  of  the  algorithm  presented  in  this  paper 
copes  better  with  variations  of  the  source  levels.  Only  for  high 
SNR  conditions,  the  blocking  matrix  is  equal  to  the  case  of  AP 
and  RCAP: 


SNIh  ->  oo  02  — >  || b2 1|  2  (14) 

SNR?  -►  0  02  — *  SNR.2 

If  more  than  two  sources  are  considered,  the  matrix  P  is  de¬ 
fined  as  the  inverse  of  the  correlation  matrix  of  the  noise  plus  the 
NS—1  source  signals  different  from  the  one  whose  parameters  are 
being  estimated.  This  can  be  easily  deduced  from  the  Appendix. 

5.  SIMULATIONS  AND  RESULTS 

Some  simulations  and  results  about  the  performance  of  the  algo¬ 
rithms  are  now  presented.  We  consider  the  presence  of  two  inde¬ 
pendent  distributed  sources  at  8°  and  -10°,  with  spreading  parame¬ 
ters  0.5  and  0. 1  respectively.  The  smaller  the  spreading  parameter 
is,  the  more  distributed  the  source  is. 

The  next  figure  shows  the  RMS  error  in  the  mean  angle  esti¬ 
mation  of  the  two  sources  as  a  function  of  the  number  of  antennas 
and  the  algorithm.  The  SNR  for  each  source  is  10  dB.  and  the  num¬ 
ber  of  snapshots  used  to  estimate  the  covariance  matrix  is  100.  100 
simulations  have  been  carried  out  per  point  in  the  curves. 


RMS  error  in  angle  estimation 


Fig.l.  RMS  error  vs.  number  of  antennas. 


It  can  be  observed  that  the  angle  estimation  of  the  source  at 
-10° .which  is  the  most  distributed,  presents  higher  error.  It  can  be 
also  seen  that  the  blocking  method  performs  better  than  the  sub- 
straction  method. 

The  impact  of  the  SNR  variation  on  the  estimation  perfor¬ 
mance  of  the  spreading  parameters  is  shown  in  Fig.  2.  The  simula¬ 
tion  parameters  are  the  same  as  in  the  previous  figure,  where  now 
8  antennas  are  used  and  both  sources  have  the  same  level.  SNR 
refers  to  the  signal-to-noise  ratio  of  each  source.  For  high  SNR 


530 


conditions,  both  methods  perform  similar. 


RMS  error  in  spreading  parameter  estimation 


In  Fig.  3,  the  dependence  on  the  number  of  snapshots  used 
to  estimate  the  covariance  matrix  is  analysed.  Both  signal  sources 
have  a  SNR  equal  to  10  dB  and  the  number  of  antennas  is  8. 


RMS  error  in  angle  estimation 


Fig. 3.  RMS  error  vs.  number  of  snapshots. 


It  is  concluded  that  the  number  of  snapshots  can  impact  seri¬ 
ously  on  the  performance,  specially  when  this  number  is  low.  For 
a  high  number  of  snapshots,  the  performance  of  both  methods  is 
equivalent.  However,  the  duration  of  the  frame  and  the  number 
of  snapshots  for  considering  a  non-varying  source  are  generally 
limited. 

In  all  the  cases,  the  initial  values  in  the  simulations  were  based 
on  the  assumption  that  only  one  source  was  present  in  the  scenario. 
The  simulations  prove  that  this  initial  value  does  not  affect  nega¬ 
tively  the  performance  of  the  algorithm. 

6.  APPENDIX 

We  present  here,  the  mathematical  formulation  of  the  ML  estima¬ 
tor  described  in  section  4  for  the  case  of  the  "blocking  method”, 
where  in  each  step,  one  of  the  sources  is  considered  as  Gaussian 
noise.  The  PDF  is  /x(x;  j31,b\,a‘2]  /32,  b2): 

fx  (x)  =  TT  1  .{-(xn-ai(n)bi)HR..71(x„-a1(n)b1)| 

„=o  nQ  det  (r2) 


R2  =  <t2I  +  |32[2  b2bf  P  =  Ki R2  1  =  I  -  <62b2b^ 

The  log-likelihood  is  expressed  as  follows  Ax  =  log  (fx  (x) ) : 


Ax  =  K  -  ^  (x„  -  ai(n)bi)H  R2  1  (xn  -  ai(n)bi) 


Maximizing  with  respect  to  the  information  symbols,  they  can 
be  estimated  as  expressed  in  the  next  equations: 


bfR^bi 


Pi=I- 


bfPbi 


JV  —  1 

Ax  =  K-  Y,  xjfpffi^PxXn  =  K-  -^-fr  (pf  PPjR.) 


n— 0 

N 


jv  /  N  /  —  \ 

x-  „(pp,R)  =  X--1,(p,RP) 


k2 


=  K  ■ 


The  estimate  of  the  spatial  signature  for  the  first  source  is  ob¬ 
tained  by  maximizing  Ax: 

-  bfPRPbi 

bi  =  argmax  ■  -4s - isr-1  =>■  RPe  =  Amaxe,  Nell  =  1 

b?Pbi 
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ABSTRACT 

The  windowed  linear  local  polynomial  approximation  (LPA) 
of  the  time-varying  direction-of-arrival  (DO  A)  is  developed 
for  nonparametric  high-resolution  estimation  of  multiple  mov¬ 
ing  sources.  The  method  gives  the  estimates  of  instanta¬ 
neous  values  of  the  directions  as  well  as  their  first  deriva¬ 
tives.  The  asymptotic  variance  and  bias  of  these  estimates 
are  derived  and  used  for  the  optimal  window  size  selec¬ 
tion.  Marginal  beamformers  are  proposed  for  estimation 
and  sources  visualization.  These  marginal  beamformers  are 
able  to  localize  and  track  every  source  individually  nulling 
signals  from  all  other  moving  sources.  Recursive  imple¬ 
mentation  of  estimation  algorithms  are  developed  for  two 
different  tasks:  estimation  of  DO  As  with  varying  number 
of  sources  and  multiple  source  tracking  in  time. 

1.  INTRODUCTION 

Localization  and  tracking  multiple  narrow  band  moving  so¬ 
urces  by  a  passive  array  is  one  of  the  fundamental  problems 
in  radar,  communication,  sonar,  seismology  and  in  other  ar¬ 
rears.  In  recent  years  a  significant  progress  was  achieved  on 
the  base  of  development  and  application  of  source  move¬ 
ment  models  for  DOA  estimation.  These  techniques  are 
mainly  based  on  the  maximum  likelihood  (ML)  which  fol¬ 
lows  Kalman-style  recursive  algorithms  [7]  or  the  expec¬ 
tation-maximization  algorithms  [2],  Our  approach  mainly 
is  in  line  with  the  ideas  of  the  ML  and  model  identification 
approach  those  are  known  to  give  high-resolution  estimates 
of  DO  As  [1], 

A  recently  developed  local  polynomial  approximation  (LPA) 
beamforming  is  originated  as  an  adjustment  of  the  conven¬ 
tional  beamformer  to  nonstationary  environments  with  mov¬ 
ing  a  single  [3]-[5]  and  multiple  sources  .  It  is  shown  that 
the  LPA  beamforming  is  able  to  yield  a  very  useful  visu¬ 
alization  of  the  DOA  of  rapidly  moving  sources  as  well  as 
improved  estimation  and  high  resolution  of  tight  sources.  In 
this  paper  we  use  the  LPA  beamformers  in  order  to  obtain 
efficient  Gauss-Newton  recursive  tracking  algorithms. 

2.  PROBLEM  FORMULATION 

Let  the  uniform  linear  array  of  n  sensors  receive  q  nar¬ 
rowband  signals  impinging  from  far-field  sources  with  un¬ 
known  time-varying  directions  9r(t),  r  =  1, q.  Assume 
that  the  nx  1  array  observation  vector  x(t)  can  be  expressed 


as 

x{t)  =  y^arsr(t)  =  A(©(f))  s(t)+e(t),  (1) 

r= 1 

A(Q(t))  =  [aj,...  ,a9],  ar  =  a(0r(t)),  r  =  1, ...,q, 

where  A(Q)  is  the  nx  q  direction  matrix,  a(0)  is  the  nxl 
steering  vector,  ©  =  (6\, . . .  ,9q)T  is  the  q  x  1  vector  of 
DO  As,  s(t)  is  the  q  x  1  vector  of  source  waveforms,  and 
c(t )  is  the  n  x  1  vector  of  a  sensor  noise. 

Let  us  define  the  steering  vector  as  a(0)  =  (l.q, ...  ,qn~^)  , 
where  q  =  exp{— j  ^  dsinf?}.  d  is  the  interelement  spac¬ 
ing,  A  is  the  wavelength,  and  {•)T  stands  for  transpose.  As¬ 
sume  that  the  sensor  noise  is  a  white  zero-mean  circular 
process  with  E{ £He}  =  a2.  The  problem  is  to  find  esti¬ 
mates  of  DO  As  0r(t)  of  0r(t)  from  observations  (1).  It  is 
assumed  that  the  directions  0r(t)  are  arbitrary  functions  of 
time  belonging  to  a  nonparametric  class  of  piece-wise  con¬ 
tinuous  differentiable  functions. 

3.  LPA  ESTIMATION  OF  DOA 

Let  C  =  (co-  cj  )T  be  a  generic  notation  for  the  2D  vector 
with  co  and  c\  giving  estimates  of  0(t)  and  0^(t)  respec¬ 
tively;  C  with  a  superscript  ”fc”  in  square  brackets  desig¬ 
nates  similar  estimates  for  the  kth  source,  =  (co,/v-,ciifc)T 
0(f)  =  (01(t),...,0q(t))T  and  &W(t)  =  (<}(f),..., 

9q  ■  (f )  )r  be  vectors  of  the  true  values  of  the  DOA  and  their 
first  derivatives  at  the  time-instant  f .  We  use  also  q  x  1  vec¬ 
tors  C0  =  (c0.u--co.v)T-  Ci  =  (ci.i,...,cij9)t  and x  2 
matrix  C  =  (C0  Ci ). 

Multiple  source  nonparametric  LPA  estimates  of  DO  As 
are  defined  as  a  solution  of  the  optimization  problem  [5]: 

C(t)  =  arg(  min  Jh(C,t)),  (2) 

C,  $(t+u) 

J(C.t)  =  ^w/t(u)||e(f  +  tt)||2,  (3) 

U 

where  e(t+u)  =  x(t+u)—  A(Co+Ciu)s(f+tt).  The  crite¬ 
ria  function  Jh( C,  f )  is  a  measure  of  the  quality-of-fit  of  the 
observations  x(t  +  u)  by  the  model  A(Co+Ci  u)s(t  +  u)  in 
the  neighborhood  of  the  ’’centre”  f.  The  window  Wh{u)  = 
w(u/h)/h  formalizes  a  localization  of  fitting,  u  denotes  a 
shift  of  an  observation  snapshot  with  respect  to  the  center 
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t.  while  the  scale  parameter  h  >  0  determines  a  length  of 
the  window.  It  is  shown  that  A( Co  +  C]  tt)s(f  +  u)  fits  the 
output  of  the  array  x(t  +  u)  with  C0  and  Ci  as  estimates  of 
0(f)  and  @n^(t).  It  follows  from  (2)  that 

s(t  +  zt)  =  [AH A\~lAHx(t  +  u),  A=  A(Co  +  Cjit). 

(4) 

Inserting  s(t  +  u)  into  (2)-(3)  results  in 

C(t)  =  arg(maxP), 
c 

P  =  y ^Wh(u)xH{t  +  u)QAx{t  +  u),  (5) 

U 

Qa  =  A[AhA]~'Ah,  (6) 

where  Qa  is  the  projection  matrix  onto  the  column  space  of 
A±A{C0  +  Cxu). 

For  <y  =  l  (5)  gives  the  LPA  beamformer  power  [3]  as 
Plpa(C )  =  -  Y' u>h(u)  | aH(c0  +  c\u)x(t  +  u)  |2. 

n  «  (7) 


the  sampling  be  periodical  with  a  sampling  period  T.  In  the 
above  formulas  we  assume  that  u  =  kT  and  that  the  sum¬ 
mation  is  produced  over  the  observations  into  the  window 
Wh(kT)  =  yxw{  A£-).  Let  the  angle  9k(t)  be  continuous  dif¬ 
ferentiable  function  of  time,  0k  (t  +  u)  =  0k  (f )  +  0^  ( t)u  + 

/ty\  fy 

9k  u-/2+...  and  the  waveform  5ft  (f)  be  constant  in  a  small 
neighborhood  of  the  center  t .  Then  the  accuracy  of  the  esti¬ 
mates  (5)  can  be  presented  in  the  following  structured  form 
[5], 

Proposition  1  Let  h  — »  0,  T  — *•  0,  and  h/T  — ►  oo,  then: 

(1)  The  bias  of  the  estimates  is  given  as 

DhE{AC^}  ~ 

-Or1  J \h2ef\t)\u2  +  hzdf\t)l-u2)Udu, 

<3 >i  =  y  w{u)UUTdv,  UT  =  (1,«). 

(2)  The  covariance  matrix  of  the  estimates  is 


Let  us  consider  P  as  a  2D  conditional  function  of  the  para¬ 
meters  of  the  kth  source,  C  =  ,  provided  that  all  other 

parameters  C'M ,  r  ^  k,  are  fixed.  Rewrite  A  in  (6)  as  a 
structured  matrix  A  =  [tift  Ak],  where  n  x  (q  —  1)  ma¬ 
trix  Ak  is  A,  where  kth  column  ak  is  omitted.  Then  Qk  = 
Ak  [A^Ak]-1  A^  is  a  projector  on  the  subspace  spanned  by 
the  columns  of  Ak,  Of  =  I —Qk,  I  is  the  2x  2  identity  ma¬ 
trix.  The  only  term  of  P  depending  on  C  can  be  represented 
in  the  form  [6]: 

pmarg(c >*><?)  =^^H^±^^h(y)\akQkx(t  +  u)\2. 

u  fe  v  fe  "  (8) 

In  this  notation  q  is  a  total  number  of  sources,  k  indicates  a 
desirable  source  to  which  this  partial  power  function  is  des¬ 
ignated  while  all  other  variable  C*M,  r  =  1, q,  q  ^  k, 

are  fixed.  We  call  PmARG(C,  d)  a  marginal  LPA  beam- 
former.  Thus,  on  the  definition,  the  marginal  beamformer 
is  a  varying  part  of  the  array  output  power  depending  on 
C'k  1.  It  works  as  a  conventional  beamformer  for  the  kth 
source  with  a  generalized  sidelobe  canceller  nulling  the  sig¬ 
nals  from  all  other  sources. 

The  marginal  power  Pm\rg{C ,  t.  q)  contains  all  informa¬ 
tion  needed  for  optimization  on  and  the  2 q  dimen¬ 
sional  optimization  problem  (5)  can  be  replaced  by  q  two 
dimensional  optimization  problems  (with  motivation  simi¬ 
lar  to  given  for  unmoving  sources  in  [6]) 

(9k{t),  6^’  (f))  =  arg(max  PmARG(C,  t,q)).  (9) 

4.  ACCURACY  ANALYSIS 

Let  the  estimation  errors  be  defined  as  vectors  AC^l  = 
(A0fc,  A O^f,  A9k  =  0fe(t)-co,fc,  A0™  =  0^\t)-cltk, 


1  A— 1 


Dkcc{^)Dh  = 

■»2  -  J v?(u )UUTdv,  D),  -  dtaeikh). 


(10) 


where 

Lk 


A2 


1 


2r7ft(f)(27r02  SNRk 


SNRk  =  \sk(t)\2 /<r2, 


and 


Vk(t)  =  aH(9k(t))BQfBa(0k(t))  - 
\a»(9k(t))QfBa(9k(t))\2 
aH(0k(t))Qfa(6k(t))  ’ 

B  =  diag(0, 1,2,  ...,n  —  1). 

The  proof  is  obtained  by  the  technique  mainly  based  on 
the  Taylor  series  assuming  that  the  estimation  errors  as  well 
as  all  disturbances  are  small. 

Comments  to  the  proposition :  (1).  The  one  source  signal 
case  can  be  considered  as  a  particular  case  of  the  derived 
results  provided  that  Qk  =  In/n. 

Then  r)k{t)  =  n(n2  —  1)/12  and 


n(n2  -  l)(2ttd)2  SNR'  v  ' 

(2) .  The  parameter  r/k(t)  is  the  only  term  in  the  above  for¬ 
mulas  depending  on  the  fact  that  the  multiple  source  case  is 
considered. 

(3) .  Assume  that  the  window  is  symmetric,  w(u)  =  w(—u). 
Then  the  formulas  of  Proposition  1  are  simplified  and  the 
results  can  be  presented  into  the  following  explicit  form: 
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(a)  For  the  bias 


E{A0k}  =  -^-0lk2\t)  J  w(v)v2dv,  (12) 

(b)  For  the  covariance 

rarfAS‘>  “  h^km  J^dv-  <13» 
,  -v  i 

k  2ilk(t)(2nd)2  SNRk' 


(4).  Consider  the  mean  squared  errors  ( AISE )  of  estimation 
using  the  formulas  ( 12)-(  13).  It  is  clear  that  the 


+  (^-<?[2)(i)  J  ™{v)v2dv)2. 


8Pp\k) 


A(C-tcl)  „  (2nd\- 

:dcT  ~  V  a  ) 

E,  ,cos2(c0 +  C]tt) 
Wh(u) - -rrm -  X 


LPA 

dCdC 


°-HQka 


{xH{t  +  u)QkRRHQkx{t  +  u)}UUT, 
R  =  Ba  +  a  •  a. 


where  a  =  n(c0  4-  c\u ).  B  =  diag{ 0, 1, n  —  1},  a  = 
(aHBQ£a  —  aHQ^Ba)/aHQ^a,  and  the  vector  a  as  a 
function  of  co  +  ciit  is  defined  by  (1). 

Comparison  of  the  mentioned  two  types  of  the  algorithms  is 
definitely  in  a  favor  of  the  latter,  which  is  much  simpler  in 
implementation  and  provide  a  similar  accuracy  of  tracking. 
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has  a  minimum  on  h  which  defines  the  optimal  window  size 
as 


hBpt  — 


TLk  J  w2(v)dv 


1/5 


(0{2)(t)cos0k(t)  J  w(v)v2dv)2  J  (15) 
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ABSTRACT 

We  consider  the  direction  finding  problem  in  partly  cali¬ 
brated  arrays  composed  of  nonidentical  subarrays  which  are 
displaced  by  unknown  vector  translations.  A  new  compu¬ 
tationally  efficient  algorithm  is  developed  for  the  recently 
proposed  RAnk  REduction  (RARE)  estimator  [1], 

1.  INTRODUCTION 

Direction-Of- Arrival  (DOA)  estimation  of  narrowband  sou¬ 
rces  in  large  sensor  arrays  composed  of  multiple  subarrays 
has  recently  attracted  a  significant  attention  of  specialists 
because  using  subarrays  on  a  sparse  grid  extends  the  ar¬ 
ray  aperture  without  a  corresponding  increase  in  hardware 
and  software  costs  [2].  Also,  exploiting  some  particular 
subarray  structure  may  enable  simple  formulations  of  the 
DOA  estimation  problem.  For  example,  a  search-free  poly¬ 
nomial  rooting-based  formulation  of  the  MUSIC  algorithm 
has  been  obtained  in  [3]  for  sensor  arrays  composed  of  iden¬ 
tical  (and  identically  oriented)  subarrays  displaced  by  arbi¬ 
trary  but  known  translations.  An  essential  shortcoming  of 
this  approach  is  that  the  subarrays  are  restricted  to  be  iden¬ 
tical  ULA's  and  the  exact  knowledge  of  all  sensor  positions 
is  required.  Obviously,  such  knowledge  may  be  unavailable 
in  large  array  systems  where  calibration  of  the  whole  array 
usually  represents  much  more  challenging  task  than  calibra¬ 
tion  of  each  subarray  and,  additionally,  subarray  positions 
may  change  with  time. 

Recently,  a  new  search-free  eigenstructure-based  appro¬ 
ach  to  DOA  estimation  has  been  proposed  [1],  which  over¬ 
comes  the  aforementioned  shortcomings  of  [3]  and  other 
self-calibration  techniques.  This  approach  is  referred  to  as 
the  RAnk  REduction  (RARE)  estimator  and  is  applicable  to 
partly  calibrated  arrays  which  may  involve  several  noniden¬ 
tical  but  identically  oriented  subarrays  displaced  by  arbi¬ 
trary  unknown  vector  translations.  In  [1],  it  has  been  shown 
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that  RARE  approaches  the  corresponding  Cramer-Rao  Bou¬ 
nd  (CRB)  and  enjoys  simple  implementation,  which  entails 
computing  the  eigendecomposition  of  the  sample  array  co- 
variance  matrix  and  polynomial  rooting.  However,  the  com¬ 
putation  of  the  coefficients  of  the  RARE  polynomial  repre¬ 
sents  quite  a  complicated  problem. 

In  this  paper,  we  obtain  a  new  condition  for  the  number 
of  subarrays  which  guarantees  the  uniqueness  of  the  RARE 
DOA  estimates,  present  the  procedure  to  determine  the  de¬ 
gree  of  the  RARE  polynomial,  and  develop  an  efficient  tech¬ 
nique  to  compute  its  coefficients. 

2.  RARE  ESTIMATOR 

Consider  an  array  of  M  omnidirectional  sensors  which  re¬ 
ceives  L  <  M  narrowband  signals  impinging  from  the  un¬ 
known  DOA's  {#i , . . .  ,0l}.  Let  this  array  consist  of  K 
identically  oriented  linear  subarrays  whose  interelement  spa- 
cings  are  integer  multiples  of  the  known  shortest  baseline 
(I.  The  geometry  of  each  subarray  is  assumed  to  be  known, 
whereas  the  inter-subarray  displacements  are  assumed  to 
be  unknown.  An  example  of  such  array  (composed  of  four 
subarrays)  is  shown  in  Fig.  1 .  Note  that  unlike  [2]  and  [3], 
the  subarrays  are  allowed  to  be  nonidentical  to  each  other 
and  some  of  them  even  may  consist  of  a  single  sensor1  (as 
the  fourth  subarray  in  Fig.  1). 

Let  Mi,  >  1  be  the  number  of  sensors  of  the  fcth  subar¬ 
ray,  so  that  M  =  A/*.  Note  that  Mfc  may  take  differ¬ 

ent  values  for  various  subarrays.  For  the  sake  of  simplicity, 
it  is  convenient  to  define  each  subarray  by  means  of  a  certain 
translation  of  a  part  of  sensors  of  an  M -element  nominal 
(virtual)  uniform  linear  array  (ULA),  where  M  >  M.  This 
representation  is  illustrated  in  Fig.  2  for  the  specific  case2 
M  =  M  where  the  second,  third,  and  fourth  subarrays  of 
Fig.  1  are  interpreted  as  a  result  of  three  unknown  vector 

1  However,  note  that  a  certain  condition  formulated  below  must  be  ful¬ 
filled  for  the  number  of  subarrays. 

-  Note  that  in  what  follows,  mostly  the  case  M  =  M  will  be  con¬ 
sidered,  where  each  sensor  of  the  virtual  ULA  becomes  the  part  of  some 
subarray. 
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Fig.  1.  A  particular  example  of  the  considered  type  of  sen¬ 
sor  array:  first  subarray  (+),  second  subarray  (□),  third  sub¬ 
array  (A),  fourth  subarray  (*). 

translations  {£fc ,  k  =  1, 2, 3}.  In  the  general  case  of  K  sub¬ 
arrays,  the  K  -  1  translation  vectors  {£j,£2,  •  •  • 
are  required  to  determine  the  array  geometry  (we  assume 
without  loss  of  generality  that  £0  =  0).  The  problem  is  to 
estimate  the  DOA  vector  6  —  [d\ ,  #2 ,  ■  ■  ■  ,  ®l}T  from  array 
observations. 

It  can  be  readily  shown  that  the  narrowband  model  for 
the  M  x  1  steering  vector  can  be  written  as  [1] 

a($,a)  =  Q(6)Th(9,a)  (1) 

where  the  2  (K  —  1)  x  1  vector  a  =  vec  {JT}  and  the 

(K  -  1)  x  2  matrix  ft  =  [£x ,  £2*  •  •  •  ,  £, k-  1] T-  The  vector 
a  combines  all  unknown  inter-subarray  displacement  pa¬ 
rameters, 

h(6,a)  =  l,eJ'(2irM)€?>  ...  >e#( 2*/\)eK-i<t>V 

Q{6)  =  diag  jl,  ej(2*/x)ds  inff, ...  ,  eJ(M-i)(2n/\)dsin  ej 

(f)  =  [sin#,  cos0]T,  and  A  is  the  wave¬ 

length.  The  M  x  K  full  column  rank  selection  matrix  T 
consists  of  zeros  and  ones  and  shows  how  the  sensors  of  the 
nominal  ULA  are  distributed  among  the  subarrays,  so  that 
the  (m,  fc)th  element  of  T  is  equal  to  one  if,  after  the  trans¬ 
lation  by  £k-i’  the  mth  virtual  ULA  sensor  becomes  a  part 
of  the  fcth  subarray,  and  equal  to  zero  otherwise.  For  exam¬ 
ple,  for  the  particular  array  configuration  shown  in  Fig.  2, 
the  selection  matrix  is  given  by 

'1000001  00  llT 
T_  0100110000 
0011000010 
0000000100 

According  to  (1),  the  array  snapshots  can  be  modeled  as 

x(t)  =  A(0,a)s(t)  +n(t) ,  t  =  1,2,...  ,N  (2) 


Fig.  2.  Representation  of  the  array  structure  of  Fig.  1  by 
means  of  the  virtual  ULA  (o)  and  three  vector  displacements 
Ci,  ^2  and  Cs- 

where  A(9,  a)  =  [a(6 1 ,  a), . . .  ,  a{Qi,  a)]  is  the  M  x  L 
direction  matrix  which  is  composed  of  the  signal  direction 
vectors  {a(0(,a)}^_x,  s(t)  is  the  L  x  1  vector  of  the  signal 
waveforms,  n(t)  is  the  M  x  1  vector  of  white  sensor  noise, 
and  N  is  the  number  of  snapshots.  The  sample  estimate  of 
the  covariance  matrix 

-R  =  E{a:(f)a:H(t)}  =  A(0,  a)SAH{e,  a)  +  o2I  (3) 
is  given  by 

1  N 

R= —Y^x{t)xIi{i)  (4) 

n=l 

where  S  =  E  { s(t)sH(t)}  is  the  L  x  L  source  covariance 
matrix,  I  is  the  identity  matrix,  (-)H  is  the  Hermitian  trans¬ 
pose,  and  E{  •}  denotes  the  statistical  expectation.  The  noise 
is  assumed  to  be  a  zero-mean  spatially  white  process  with 
the  identical  variance  o2  in  each  sensor. 

Write  the  eigendecompositions  of  the  matrices  (3)  and 
(4)  as 

R  =  EsAsEg  +  EnAnE^j  (5) 

R  =  EsAsEs  +  EnAnEn  (6) 

where  the  L  x  L  diagonal  matrices  A 5  and  A 5  contain  the  L 
signal-subspace  eigenvalues  of  R  and  R,  respectively,  and 
the  (M  —  L)  x  (M  —  L)  diagonal  matrices  A,v  and  Ajv 
contain  the  M  —  L  noise-subspace  eigenvalues  of  R  and  R. 
respectively.  In  turn,  the  columns  of  the  M  xL  matrices  E s 
and  Es  contain  the  signal-subspace  eigenvectors  of  R  and 
R,  respectively,  whereas  the  M  x  (M  -  L)  matrices  E n 
and  En  are  composed  of  the  noise-subspace  eigenvectors 
of  R  and  R,  respectively. 

Let  us  consider  the  conventional  spectral  MUSIC  algo¬ 
rithm  which  estimates  the  signal  DOA's  from  the  L  deepest 
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minima  of  the  function 


/music (#)  -  aH(9,  a)ENENa(8,  a) 


be  the  parameter  determining  whether  the  Ath  subarray  man¬ 
ifold  is  unambiguous.  Then,  provided  that  the  following 
(7)  condition  is  satisfied 


In  the  ideal  case  of  exactly  known  R,  the  DOA's  can  be 
found  from  the  equation 

aH{0 ,  a)ENE%a(6,  a)  =  0  (8) 

Since  the  vector  parameter  a  is  unknown,  the  minimization 
of  (7)  requires  an  exhaustive  (2 (K  —  1)  +  l)-dimensional 
search  which  becomes  totally  impractical  for  K  >  1.  In¬ 
serting  (1),  we  can  rewrite  (8)  as 

hH(8,a)B{z)h{0,a)  =  0  (9) 

where 

B(z)  =  TtQ(1/z)EnEkQ(z)T  (10) 

is  the  K  x  K  Hermitian  matrix,  Q  is  reformulated  in  terms 
of  z  =  ej(27rA)rfsm  e  as  Q(z)  =  diag  { 1,  z, . . .  ,  cA/_1 } 
and  the  obvious  property  QH{z)  —  Q(l/z)  is  used.  A 
very  important  observation  here  is  that  the  vector  parameter 
a  is  contained  in  h(6,  a)  only,  so  that  the  matrix  B(z)  is 
independent  of  a. 

It  is  worth  noting  that  B{z)  in  the  general  case  is  full 
rank.  Note  that  (9)  may  hold  true  only  if  the  matrix  B(z) 
drops  rank,  so  that  rank{J3(z)}  <  I\  or,  equivalently,  when 
the  polynomial 


K 

L<Y,MMk- 1)  (15) 

A-  =  l 


equation  (9)  holds  true  if  and  only  if  P(z) 


l=l=i 


=  0. 


□ 


An  important  corollary  following  from  this  theorem  is 
that  the  signal  DOA  's  can  be  found  by  rooting  the  polyno¬ 
mial  P(z)  without  exploiting  any  knowledge  of  the  inter- 
subarrav  displacement  parameters  a.  From  ( 1 5)  we  obtain 
that  L  <  EjL,  MAh  -  1)  <  EL  ( M *  -V  =  M-  K. 
Therefore,  condition  (15)  can  be  interpreted  as  a  strength¬ 
ened  version  of  the  necessary  condition  K  <  M  —  L  dis¬ 
cussed  in  [1],  Furthermore,  (15)  simplifies  to  the  latter  in¬ 
equality  K  <  M  -  L  in  the  case  when  all  subarrays  an 
unambiguous,  i.e.  /?*■  =  1,  A  =  1, 2, . . .  ,  I\. 


3.  IMPLEMENTATION 

To  apply  RARE  to  the  finite  observation  case  (where  only 
the  sample  covariance  matrix  (4)  is  available),  we  should 
root  the  sample  polynomial 

P{z)  =  det  {B(z)}  (16) 


where 


P(z)  =  det{B(z)}  =  0  (11) 


B(z)  =  TtQt(1/z)EnEHnQ(z)T  (17) 


An  important  question  now  is  whether  B(z)  may  drop  rank 
for  some  values  of  z  which  lie  on  the  unit  circle  but  do  not 
nullify  the  MUSIC  polynomial  /musi c{z)-  The  following 
theorem  answers  this  question. 

Theorem:  Let 

v(z)  =  [1,2:,...  ,zM-1]T  (12) 

be  the  M  x  1  virtual  ULA  steering  vector  written  in  terms 
of  z  and 

dk{z)  =  [dk,i{z),dkt2{z),...  , d*,A4 (*)]T  <B) 

be  the  Mk  x  1  vector  composed  of  v(z)  by  taking  into  ac¬ 
count  only  the  sensors  of  the  Ath  subarray.  Let 


and  then  find  the  signal  DOA's  from  the  L  roots  of  (16) 
which  are  closest  to  the  unit  circle  (in  the  similar  way  as  in 
root-MUSIC).  Note  that,  similar  to  root-MUSIC,  the  RARE 
roots  enjoy  conjugate  reciprocity  property,  i.e.  if  z  is  the 
root  of  P(z)  then  1  /z*  is  also  the  root  of  P{z).  Therefore, 
to  obtain  the  signal  DOA's,  it  is  sufficient  to  examine  the 
roots  of  P(z)  inside  the  unit  circle. 

Another  interesting  observation  that  P(z)  reduces  to  the 
root-MUSIC  polynomial  in  the  case  K  =  1  [11- 

Two  important  questions  arise  when  implementing  the 
RARE  estimator.  First  of  all,  it  is  important  to  determine  the 
degree  of  the  RARE  polynomial.  Second,  a  low-complexity 
algorithm  to  compute  the  coefficients  of  this  polynomial  is 
required.  These  issues  are  addressed  below. 


dk{z)  =  dk{z)/dkA{z) 


(14)  3.1.  Degree  of  the  RARE  Polynomial 


be  the  steering  vector  of  the  Ath  subarray  with  the  phase 
origin  in  the  first  sensor  of  this  subarray,  and 

a  _  f  1 ,  dk{z)  -  dk{z’)  if  and  only  if  8  =  6’ 

P k  \  0 ,  otherwise 


It  can  be  readily  shown  that 

K 

-Drare  =  "^2  Dk.k 
k=\ 


(18) 
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where  -Drare  is  the  degree  of  P(z), 

Di,k  =  degree  {d] {l / z)U iU^ dk{z)}  (19) 

and  the  matrix  Ui  is  composed  of  Eff  by  taking  only  the 
rows  of  the  latter  matrix  which  correspond  to  the  Zth  subar¬ 
ray.  From  (18)-(19),  it  follows  that  each  particular  degree 
Dk,k  and,  therefore,  the  total  degree  Drare  essentially  de¬ 
pend  on  how  the  subarrays  have  been  chosen. 

3.2.  Computing  the  RARE  Polynomial  Coefficients 

Let  us  consider  bk,i,  the  (k.  Z)th  element  of  the  polynomial 
matrix  B(z).  It  can  be  written  as  the  polynomial 

bkAz)  =  dl(l/z)UkU?d,(z)  (20) 

Let  pkj  (n)  denote  the  polynomial  coefficient  of  bkj (z)  that 
corresponds  to  zn.  It  can  be  readily  verified  that  Pkj(n) 
is  zero  if  zn  is  not  representable  as  a  product  of  any  two 
elements  of  vectors  dk(l/z)  and  di(z).  However,  if  zn  can 
be  represented  as  such  product,  i.e. 

Zn  =  dk'P(l/z)dpm{z )  (21) 

for  some  p  €  {1,2 ...  ,Mk}  and  m  €  {1,2...  ,  M;}  then 

the  coefficient  pk,i{n)  is  nonzero  and  can  be  computed  as 

A  A 

a  sum  of  (p,  m)th  elements  of  the  matrix  UkUi  over  all 
pairs  {p,m}  of  indices  which  satisfy  (21).  Making  use  of 
the  well-known  recursive  formula  for  computing  determi¬ 
nants,  we  can  write 
K 

P(z)  =  5](-1)fc+16fc,1(^)det{DM(^)}  (22) 

k= 1 

where  Bk,m(z)  is  the  (K  —  1)  x  (K  —  1)  matrix  obtained 
from  B(z)  by  deleting  its  fcth  row  and  mth  column. 

Let  Pg{n)  denote  the  coefficient  of  the  polynomial  P(z) 
which  corresponds  to  zn.  Similarly,  let  (n)  denote 
the  polynomial  coefficient  of  det  { Bk,m{z )}  which  corre¬ 
sponds  to  zn.  Note  that  the  sequences  of  polynomial  co¬ 
efficients  and  the  polynomials  themselves  form  the  follow¬ 
ing  ^-transform  pairs:  bk,m(z)  =  Z{pkiJn(n)},  P{z)  = 
^{PbW}’  311(1  det{J5fc;ra(z)}  =  Z{pgkm{n)}.  Using 
(22)  and  the  convolution  property  of  z -transform,  we  obtain 
K 

Pb(H)  =  ^(-1)*+1Pfc,1(n)  *PBfc,a(n)  (23) 

k=l 

Equation  (23)  is  the  core  formula  for  computing  the  RARE 
polynomial  coefficients.  It  relates  the  coefficients  of  the 
polynomial  det{B(2:)}  with  the  coefficients  of  the  poly¬ 
nomials  det{Bfc,i}.  Obviously,  (23)  can  be  applied  re¬ 
cursively  to  compute  the  coefficients  of  each  polynomial 
det{Bkl  (z)}  via  the  coefficients  of  the  polynomials  for¬ 
med  by  the  determinants  of  submatrices  of  Bk, i  and  so  on. 


Fig.  3.  The  RMSE  of  RARE  and  the  CRB  [1]  versus  the 
SNR. 

4.  SIMULATIONS 

We  assume  an  array  of  M  =  6  sensors  which  is  composed 
of  K  —  3  two-element  subarrays  (Mi  =  M2  =  M3  =  2). 
The  interelement  spacings  of  the  first,  second,  and  third  sub¬ 
arrays  are  d  =  A/6,  2d  =  A/3,  and  3d  =  A/2  respectively. 
The  inter-subarray  displacements  are  unknown  and  equal  to 

£1  =  [7.56d,  25.43d]  and  £2  =  [0.93d, -12.2 7d],  respec¬ 
tively. 

Two  uncorrelated  equi-powered  sources  are  assumed  to 
impinge  on  the  array  from  0\  =  5°  and  6L  =  11°.  All  re¬ 
sults  are  averaged  over  100  simulation  runs  and  N  =  100  is 
taken.  In  Fig.  3,  the  RMSE  of  RARE  is  displayed  versus  the 
SNR  along  with  the  deterministic  CRB  derived  in  [1]  for  the 
case  of  unknown  inter-subarray  displacements.  Our  simula¬ 
tions  show  that  the  performance  of  RARE  is  very  close  to 
the  corresponding  CRB.  It  is  worth  noting  that  RARE  ap¬ 
pears  to  be  the  only  method  applicable  in  this  scenario. 
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ABSTRACT 

This  work  is  concerned  with  recursive  procedures  in  which  the 
data  run  through  sequentially.  Two  stochastic  approximation  re¬ 
cursions  derived  from  EM  and  SAGE  algorithms  are  proposed. 
We  show  that  under  regularity  conditions,  these  recursions  lead  to 
strong  consistency  and  asymptotic  normality.  Although  the  recur¬ 
sive  EM  and  SAGE  algorithm  do  not  have  the  optimal  convergence 
rate,  they  are  usually  easy  to  implement.  As  an  example,  we  derive 
recursive  procedures  for  direction  of  arrival  (DOA)  estimation.  In 
numerical  experiments  both  algorithms  provide  good  results  for 
low  computational  cost. 

1.  INTRODUCTION 

This  work  is  concerned  with  recursive  parameter  estimation  using 
augmented  data.  The  EM  algorithm  [1]  is  a  well  known  numer¬ 
ical  procedure  to  locate  modes  of  a  likelihood  function  which  is 
characterized  by  its  simple  implementation  and  stability.  One  of 
its  variants,  the  SAGE  algorithm  [3]  generalizes  the  idea  of  data 
augmentation  to  facilitate  more  flexible  choices  of  parameter  sets 
and  faster  convergence  rates  in  some  settings. 

Both  algorithms  provide  iterative  estimates  based  on  the  same 
batch  of  data.  If  the  data  sets  are  large,  these  procedures  could  be¬ 
come  expensive.  To  solve  this  problem  we  propose  two  stochastic 
approximation  recursions  derived  from  EM  and  SAGE  algorithms 
in  which  the  data  arrive  sequentially. 

The  first  recursive  EM  algorithm  was  suggested  by  Tittering- 
ton  [6]  where  the  step  size  is  limited  to  be  a„  =  n~° .  The  con¬ 
sistency  and  asymptotic  normality  was  only  presented  for  the  uni¬ 
variate  version.  Here  we  will  consider  a  more  general  case  where 
on  =  an~“  and  generalize  the  asymptotic  properties  to  the  multi¬ 
variate  case. 

Furthermore,  we  propose  a  new  recursion  derived  from  the 
SAGE  algorithm  in  which  a  more  flexible  choice  of  parameter  sets 
is  allowed.  Under  regularity  conditions  the  sequence  of  estimates 
generated  by  the  recursive  SAGE  algorithm  enjoys  strong  consis¬ 
tency  and  asymptotic  normality  as  well. 

Compared  to  the  stochastic  approximation  procedure  with  op¬ 
timal  convergence  rate  where  the  inversion  of  the  observed  infor¬ 
mation  matrix  is  necessary  [2],  the  inverse  of  the  augmented  infor¬ 
mation  matrices  used  in  recursive  EM  and  SAGE  algorithms  are 
usually  much  easier  to  compute. 

As  an  illustrative  example,  we  apply  the  proposed  algorithms 
to  direction  of  arrival  (DOA)  estimation.  Because  of  the  diago¬ 
nal  structure  of  the  augmented  information  matrices,  the  recursive 

This  work  has  been  supported  by  German  Science  Foundation. 


procedures  have  very  simple  implementations.  In  numerical  ex¬ 
periments.  we  consider  a  critical  scenario  in  which  two  sources 
are  closely  located.  Results  show  that  both  algorithms  converge 
to  the  true  parameters  and  the  mean  squared  errors  decrease  with 
time. 

This  paper  is  organized  as  follows.  The  recursive  EM  and 
SAGE  algorithms  are  developed  in  section  2  and  3.  Consistency 
and  asymptotic  normality  of  both  algorithms  are  proved  in  section 
4.  In  section  5,  we  derive  recursive  procedures  for  DOA  estima¬ 
tion.  Numerical  results  are  presented  in  section  6. 

2.  RECURSIVE  EM  ALGORITHM 

Suppose  xi,xo, ...  are  independent  observations,  each  with  un¬ 
derlying  probability  density  function  (p.d.f.)  fx{$,6 ),  where  6  6 
Rm  denotes  an  unknown  parameter  vector.  Y  is  the  augmented 
data  used  by  the  EM  algorithm  with  p.d.f.  fy{y,0)-  Let  6"  de¬ 
note  the  estimate  after  n  observations  altogether.  The  following 
recursion  is  aimed  at  finding  the  maximizing  parameter  6  —  6*  of 
l°g  fx  (x,6): 

6n+ 1  =  r  -  an~a  lEuitr1  j(Sn  ,f),  (1) 

where  a  >  0  is  a  constant  and 

7  (Sn,9n)  =  -Yelog/s(*n,0)|*=s»,  (2) 

Jem (D  =  E  [-VgEg  log  A:  (j.  0)1  *»,$]  |g=g«  (3) 

represent  the  gradient  vector  of  log-likelihood  of  the  observation 
at  time  n  and  the  Fisher  information  matrix  corresponding  to  the 
augmented  data  Y,  respectively.  is  a  column  gradient  operator 
with  respect  to  9.  The  choice  of  a  depends  on  the  matrix 

Dem(0)  =  \l  -  alEuiO)-1!^)  (4) 

where  1(6)  and  I  denote  the  Fisher  information  matrix  correspond¬ 
ing  to  one  observation  and  the  identity  matrix,  respectively.  Use 
a  =  1  if  Dem(^)  is  a  stable  matrix  and  otherwise  1/2  <  a  <  1. 
A  matrix  is  called  stable  if  all  eigenvalues  have  negative  real  parts 

15]. 

As  pointed  out  by  Titterington  [6],  there  is  a  strong  relation¬ 
ship  between  recursion  (1)  and  the  EM  algorithm.  The  consis¬ 
tency  and  asymptotic  normality  for  the  univariate  version  of  (1) 
with  a  =  1  was  also  presented  in  [6],  [7].  These  results  will  be 
generalized  later  in  section  4. 
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3.  RECURSIVE  SAGE  ALGORITHM 

The  space  alternating  generalized  EM  (SAGE)  algorithm  [3]  gen¬ 
eralizes  the  idea  of  data  augmentation  to  simplify  computations  of 
the  EM  algorithm.  It  preserves  the  stability  of  EM  and  can  im¬ 
prove  the  convergence  rate  significantly  in  some  settings.  Instead 
of  estimating  all  parameters  at  once,  each  iteration  of  SAGE  con¬ 
sists  of  C  cycles.  The  parameter  subset  associated  with  the  c-th 
cycle  6C  is  updated  by  maximizing  the  conditional  expectation  of 
the  log-likelihood  of  the  augmented  data  Zc  corresponding  to  this 
cycle. 

To  avoid  mathematical  difficulties,  we  will  only  consider  the 
case  6  =  (#i, ...  ,6c)  where  the  parameter  subsets  are  disjoint. 
Let  Zc  denote  the  augmented  data  of  the  c-th  cycle  with  p.d.f. 
fzc  (zc,6c).  The  recursive  version  of  SAGE  is  based  on  the  fol¬ 
lowing  procedure. 

At  time  instant  n+1  with  the  current  estimate  0n,  define 

c 

Ln+i(6)  =  ^2  E  [log  fzc  (0C)|  xn,9n]  +  Ln(6).  (5) 

C=1 

Choose  9n+1  by  maximizing  Ln  +i  (§)■ 

Applying  Taylor  expansion  to  the  right  hand  side  of  (5),  the 
recursion  can  be  approximated  by 

r+1 = r  -  (n + 1)-1  TSAGE(n_1 7(?n,r ),  (6) 

where  Is  age  is  a  block  diagonal  matrix  with  the  c-th  block 

4age($b)  =  E  [-2««Y&  log/*e(*c,$c)|s„,*]  |M«. 

(7) 

Without  losing  the  asymptotic  properties  of  9n ,  (6)  can  be  used 
with  a  more  flexible  step  size,  i.e. 

en+1 = r  -  an~a  isage  (rr1 7  (xn,n.  w 

Use  a  =  1  if 

Dsage(^)  =  -I  —  aIsAGE(0)_1I(^).  (9) 

is  a  stable  matrix  and  otherwise  1/2  <  a  <  1. 

4.  CONVERGENCE  AND  ASYMPTOTIC  DISTRIBUTION 

In  this  section  we  study  the  asymptotic  behaviors  of  {9n  }  gener¬ 
ated  by  the  recursive  EM  algorithm  (1)  and  the  recursive  SAGE 
algorithm  (8),  respectively.  Based  on  convergence  results  from 
stochastic  approximation  [4]  [5] ,  it  will  be  shown  that  6n  converges 
with  probability  one  to  6*  and  is  asymptotically  normal  distributed. 

To  begin  with,  define 

9(0)  =  E[-V«log  fc(x,0)}=SiJ{g,f)  (10) 

where 

J(Q,f)  =  j  log  lfx(x,9*)/fx(x,9)]  fx(x,6*)dx  (11) 


is  the  Kullback-Leibler  divergence  between  fx  (x,9*)  and  fx  (x,  6) 
It  is  well  known  that  under  regularity  conditions  J(9,9*)  >  0 
and  with  equality  if  and  only  if  6  =  9*.  Therefore  g(6  )  =  0, 
Iem(^*)_1s(^*)  =  0  and  Isage^*)-1^#’)  =  0.  Clearly, 

(1)  and  (8)  are  recursive  procedures  to  find  the  roots  of  Ui(9)  = 
Iem (9)~1g(6)  and  1/2(6)  =  Isage(^)-19(^),  respectively. 

In  the  following,  we  will  consider  the  strong  consistency  and 
asymptotic  normality  of  9n  generated  by  (1).  These  properties 
hold  also  for  0n  generated  by  (8).  Making  use  of  results  from 
stochastic  approximation  [4],  we  obtain  the  following. 

Theorem  1  Suppose  (a)  E  [7(x„,  6n)  j(xn,  0")T]  <  00  and 
(b)  Iem  (6)  >  0.  hold  for  recursion  (1).  Then  9n  converges  with 
probability  one  to  9* . 

Proof 

(1)  '£n=i  an~°  =  00,  an~a  >  0,  an_“  — »■  0,Vn  >  0. 

(2)  Under  the  assumption  E  \j(xn,0Ti)j(xn,6n)r]  <  00,  we 
have  E  [||lEM(^n)_17(*n,0")l|2]  <  00,  Vn  >  0 

(3)  Since  £1,2:2, ...  are  mutually  independent, 

Iem(^")  1j(Sn,9n)  =  lBM(9n)~1g(6'1)  +  8Mn  (12) 
where  SMn  is  a  martingale  difference. 

With  (1),(2),(3)  and  Theorem  2.1  in  [4],  6n  converges  to  6* 
with  probability  one.  □ 

Remark:  The  result  of  Theorem  1  holds  for  the  recursive  SAGE 
algorithm  if  Iem  (9)  in  ( b )  is  replaced  by  Isage  (9)- 

The  next  theorem  is  concerned  with  the  asymptotic  properties 
of  normalized  errors  about  the  limit  point  <9*.  Let  u'(0),  (i  = 
1,2)  denote  the  Jacobi  matrix  of  17,(1?)  and  A /”(.,.)  the  normal 
distribution. 

Theorem  2  Suppose  (a)  E  [  j(x„,  9n)  j(x„,  0")T]  <  00  and 
(b)  finite  Ui(0)  hold  for  (1).  Then  (i)  if  a  =  1  and  Dem  is  a 
stable  matrix,  n112  (9n  —6*)  has  asymptotic  distribution  AT (0,  V) 
where  V  is  the  solution  of 

(oA- il)V  +  V(oA-il)T-o2C  (13) 

with  A  =  Ikm (9T^(6%  C  =  1em(6*)-1T(9')1em(6')-\  (U) 
if  1/2  <  a  <1,  na/2 (6n  —  9")  has  asymptotic  distribution 
jV(Q,  V)  where  V  is  the  solution  of 

AV  +  VA  =  oC.  (14) 

Proof  By  the  mean  value  theorem  and  the  finite  matrix  U^O), 
one  can  show  that  for  9  near  9* 

(1)  kill?  -  r  II2  <  (0  -  0* )TUi (9)  <  h\\9  -  ril2 

(2)  ||Ui(0)||  <  k2\\9  —  ^*||  +  ki  where  k3 ,  j  =  1, ...  ,4  are 
constants. 
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(3)  E  [||<5M„||2]  <  oo  under  assumption  (a). 


The  EM  algorithm  has  the  augmented  data 


(4)  By  Taylor  expansion  around  6" ,  it  can  be  shown  that  Ui{6)  = 

A(0-n+°(ii0-rii). 

From  (1),(2),(3),(4)  and  Theorems  5.8,  5.9  of  [5],  we  con¬ 
clude  the  asymptotic  normality  of  0n .  □ 


Y(n)  =  [Y1(n)T  ...Ym(nf  ...YM(n)T]T  (18) 

where  Ym(n)  =  d(8m)s,„(n)  +  Um(n).  Um(n)  is  jV(0,  jj-I) 
distributed.  Let  d  (0,n)  =  $rd(0m).  The  information  matrix 
1em(8)  is  a  diagonal  matrix  with  the  m-th  diagonal  element 


Remark:  The  results  of  Theorem  2  hold  for  the  recursive  SAGE 
algorithm  if  u'i(O)  in  (b)  is  replaced  by  U2(0)  and  Dem  is  re¬ 
placed  by  Dsage  and  A=2sage(^*)_1I(^*)- 


[Iem  ((?)],: 


-ft 


As  noted  in  [4],  the  asymptotic  variances  V  or  V  are  a  mea¬ 
sure  of  the  rate  of  convergence.  The  optimal  covariance  is  achieved 
when  Zem(IT)  in  (1)  is  replaced  by  lid” )  [2].  However,  the  in¬ 
verse  of  the  augmented  information  matrices  Zem  (0n )  and 
Zsage(0'*)  are  in  general  much  easier  to  compute  than  l(9n).  By 
proper  choice  of  the  augmentation  scheme,  the  recursive  proce¬ 
dures  (1)  and  (8)  can  be  implemented  easily. 

5.  APPLICATION  TO  DOA  ESTIMATION 


+M\\d  (#,„)£,„  (n)||2 


(19) 


The  augmented  data  used  by  SAGE  is  given  by 

Z,n(n)  =  d(8,„)sm(n)  +  U(n).  (20) 

The  corresponding  Zsage(^)  is  a  diagonal  matrix  with  the  m-th 
diagonal  element 


In  previous  sections  we  were  only  concerned  with  the  theoretical 
aspects  of  the  recursive  EM  and  SAGE.  To  show  their  potential 
in  solving  practical  problems,  we  will  derive  recursive  procedures 
for  DOA  estimation  under  a  deterministic  signal  model  and  inde¬ 
pendent  Gaussian  noise.  In  this  case,  the  array  output  X(n),n  = 

1,  2, ...  are  independent  but  not  identically  distributed.  This  diffi¬ 
culty  can  be  overcome  by  considering  the  concentrated  log-likelihood 
function.  The  purpose  of  the  recursive  procedures  is  to  find  the 
maxima  of  the  concentrated  log-likelihood  function.  The  infor¬ 
mation  matrices  are  used  as  scaling  factors  to  improve  the  conver¬ 
gence  rate. 


[ZsAGE(<?)]mm  = 


+  11  d(9m)s,n{n) 


(21) 


The  recursive  EM  is  then  obtained  by  using  (17)  and  (19)  in 
the  recursion  (1).  The  recursive  SAGE  can  be  implemented  by 
using  (17),  (21)  and  (8). 


6.  SIMULATIONS 


Consider  an  array  of  N  sensors  receiving  signals  generated 
by  M  far  field  narrowband  sources.  For  signals  arriving  from  9  = 
(0i , . . . ,  6m  )  the  array  output  X  (n)  at  the  time  n  can  be  described 
as 

X(n)  =  H(0)s(n)  +  U(n),  (15) 

where  H(0)  =  [d(0i), . . .  ,d(0M)]  £  CN  x  M  contains  M  steering 
vectors  d(6m)  £<&xl(m  =  1, . . .  ,M),  s(n)  =  [si(n),. . .,  sm («)f 
g  (fxl,  U(n)  £  CN  x  1  denote  signal  waveforms,  noise  vector, 
respectively.  s(n)  is  considered  as  deterministic,  unknown.  U(n) 
is  independent  Xf(0,  vl)  distributed  with  u  being  known.  The 
problem  is  to  estimate  8  from  the  observations  A(l),  A'(2), .... 
Let  P (0)  =  H(9)(H(9yfH(9))-1H(9)H.  Omitting  the  constant 
terms,  the  concentrated  log-likelihood  of  X  ( n )  is  given  by 

L{9)  =  -X{n)HP  (9)X(n).  (16) 

v 

Taking  first  derivative  of  L(9),  the  m-th  element  of  the  gradi¬ 
ent  vector  g(8 )  can  be  written  approximately  as 


The  recursive  procedures  developed  in  the  previous  section  are  ap¬ 
plied  to  simulated  data.  The  narrowband  signals  are  received  by  an 
uniformly  linear  array  with  15  sensors.  The  signals  are  generated 
by  3  sources  with  equal  power  located  at  0tm«  —  [24°  28°  43°]. 
Note  that  the  first  two  sources  have  only  half  a  beamwidth  sepa¬ 
ration.  The  initial  estimate  is  given  by  9°  =  [19°  32°  48°].  The 
signal  to  noise  ratio  (SNR)  is  kept  at  0,  10  dB.  Each  experiment 
runs  through  50  Monte  Carlo  trials.  The  step  size  an  =  an~a  is 
chosen  to  be  3n~n  o.  Each  recursion  is  performed  from  n  =  1  to 
100. 

The  mean  values  of  estimates  from  50  Monte  Carlo  trials  are 
presented  in  fig.  1  and  fig.  2.  The  Mean  Squared  Errors  (MSE) 
are  displayed  in  fig.  3  and  fig.  4.  With  increasing  time,  both 
algorithms  lead  to  decreasing  MSE.  They  also  converge  faster  at 
the  higher  SNR.  Given  the  same  step  size  a,,,  recursive  SAGE  has 
a  better  convergence  rate  than  recursive  EM.  The  diagonal  aug¬ 
mented  information  matrices  Zem(K),  Zg|GE(4)  for  this  problem 
make  the  recursions  ( 1 ),  (8)  very  easy  to  compute.  It  is  possible  to 
use  these  recursions  for  real  time  processing. 


(0)  =  (d  (em)sm(n))H(X(n)  -  H(0)s (n)) 


(17) 


7.  CONCLUSION 


where  |(n)  =  (H(0)HH(6»))-1H(4)wA(n),  sm(n)  is  the  m-th 
element  of  s(n)  and  d  (8m)  =  g§^d(0„ ,)■ 


The  problem  of  recursive  parameter  estimation  using  augmented 
data  was  studied  in  this  work.  We  proposed  recursive  EM  and 
SAGE  algorithms  to  facilitate  sequential  processing  of  arriving 
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data.  It  was  proved  that  under  mild  conditions  sequences  of  es¬ 
timates  generated  by  the  proposed  algorithms  are  strongly  consis¬ 
tent  and  asymptotically  normal  distributed.  In  addition,  we  ap¬ 
plied  them  to  DOA  estimation  and  obtained  good  simulation  re¬ 
sults.  With  this  example  we  demonstrated  that  recursive  EM  and 
SAGE  algorithms  can  be  useful  numerical  tools  for  practical  ap¬ 
plications. 


Recursive  EM  and  SAGE  algorithms  0^(24  28  43]  $NR=0dB 


Fig.l.  Recursive  EM  and  SAGE  with  application  to  DOA  Estimation. 
Mean  of  DOA  estimates,  0true  =  [24°  28°  43°],  SNR=0dB. 


Receive  EM  and  SAGE  algorithms 0^(24  28  43]  SNR=10dB 


Fig.  2.  Recursive  EM  and  SAGE  with  application  to  DOA  Estimation. 
Mean  of  DOA  estimates.  #true  =  [24°  28°  43°],  SNR=10dB. 


Fig.  3.  MSE  at  each  recursion.  SNR=OdB. 


Fig.  4.  MSE  at  each  recursion.  SNR=OdB. 
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ABSTRACT 

Subband  arrays  have  been  proposed  as  useful  ineans  to  real¬ 
ize  joint  spatio-temporal  domain  equalization  in  digital  mo¬ 
bile  communications.  They  are  used  to  mitigate  channel 
impairment  problems  caused  by  the  inter-symbol  interfer¬ 
ence  (ISI)  and  co-channel  interference  (CCI).  In  this  paper, 
we  propose  the  normalized  subband  array  and  locally  orthog- 
onalized  subband  array  techniques  for  channel  equalizations. 
The  least  square  mean  (LMS)  algorithm  is  used  for  adap¬ 
tation.  The  convergence  performance  of  the  proposed  tech¬ 
niques  is  analyzed  and  compared  with  that  of  conventional 
space-time  adaptive  processing  (STAP)  techniques.  It  is 
shown  that  subband  decompositions  provide,  great  flexibility 
in  implementing  spatio-temporal  equalizations.  Both  ana¬ 
lytical  and  numerical  simulation  results  demonstrate  that 
the  proposed  subband  array  techniques  substantially  improve 
the  convergence  performance  without  significant  additional 
computations. 

1.  INTRODUCTION 

For  high-speed  digital  wireless  networks,  the  communica¬ 
tion  channels  are  often  frequency-selective,  and  the  inter- 
symbol  interference  (ISI)  becomes  highly  pronounced.  Ano¬ 
ther  pressing  problem  in  mobile  communication  is  the  co¬ 
channel  interference  (CCI),  which  is  generated  due  to  the 
frequency  reuse  in  cellular  systems.  Space-time  adaptive 
processing  (STAP)  systems  prove  useful  in  suppressing  both 
the  ISI  and  CCI,  leading  to  increased  capacity  and  range 
[11- 

Subband  adaptive  arrays  have  been  proposed  as  alter¬ 
native  to  STAP  for  a  variety  of  purposes.  In  [2]  -  [4],  the 
authors  proposed  to  use  subband  arrays  to  realize  joint  spa¬ 
tial  and  temporal  domain  equalization.  In  [5],  the  steady 
state  mean  square  error  (MSE)  performance  was  analyzed 
for  different  feedback  schemes  of  subband  arrays. 

The  work  of  Y.  Zhang  and  M.  G.  Amin  is  supported  by  the 
Office  of  Naval  Research  under  Grant  N000 14-98- 1-01 76  and  the 
Air  Force  Research  Laboratory  under  Grant  no.  F30602-00-1- 
0515. 


In  this  paper,  we  propose  the  normalized  least  mean 
square  (LMS)  subband  array  and  the  locally  orthogonalized 
LMS  subband  array  techniques,  and  analyze  their  conver¬ 
gence  properties  in  comparison  with  that  of  conventional 
STAP  techniques.  It  is  shown  that  the  subband  decompo¬ 
sition  offers  flexible  implementation  of  the  spatio-temporal 
equalization.  The  analysis  and  numerical  simulations  demon¬ 
strate  that  the  proposed  subband  array  techniques  substan¬ 
tially  improve  the  convergence  performance  without  signif¬ 
icant  additional  computations. 

2.  SIGNAL  MODEL 

We  consider  a  base  station  using  an  antenna  array  of  N 
sensors  with  P  users,  where  P  <  N.  The  signal  of  interest  is 
denoted  by  si(m),m  6  (-00,00),  whereas  the  signals  from 
other  users  are  sp(m),  p  =  2, ...,  P.  Assume  the  symbol 
period  T  is  common  for  the  P  users.  Then,  the  received 
data  vector  at  the  array,  x(t)  =  [xi (f), X2(f),  •  •  • ,  a:jv(t)]/ , 
can  be  expressed  as 

p  00 

x(t)  =  ^2  -  mT) +  k(0  (R 

p=\  m  =  —  o o 

where 

sp(rn):  information  symbol  of  the  pth  user, 
h  (f):  channel  response  vector  (including  pulse  shaping 
filter)  of  the  pth  user, 

b(f):  additive  noise  vector. 

We  make  the  following  assumptions. 

Al)  The  user  signals  sp(m),p  =  1,2,  ...,P,  are  wide- 
sense  cyclo-stationary  and  i.  i.  d.  (independent  and  identi¬ 
cally  distributed)  with  _E[.s,,(rn).s'(m)]  =  1. 

A2)  All  channels  h;)(f),p  =  1,2,  ...,P,  are  linear,  quasi¬ 
static,  and  of  a  finite  duration  within  [0 ,  DPT].  That  is, 
hp(t)  =  0,p  =  1,  2, ...,  P,  for  t  >  DPT  or  t  <  0. 

A3)  The  noise  vector  b(f)  is  zero-mean,  temporally  and 
spatially  white  with 

E[b(t)hT (t  +  r)]  =  0,  for  any  r, 
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and 


E[h(t)bH  (t  +  r)]  =  <j5(t)  In, 

where  the  superscripts  T  and  ”  denote  transpose  and  con¬ 
jugate  transpose,  respectively,  S(-)  is  the  delta  function,  and 
Ijv  is  the  N  x  N  identity  matrix. 

The  data  vector  x(f)  is  sampled  at  Ts  =T/J,  where  J  > 
1  is  an  integer  representing  the  oversampling  rate.  Define 

x(n)  = 

[xT(nT)  xT  (nT  —  Ts)  ■■■  xT  (nT  -  (J  -  1)TS)  ]T 

as  the  JNxl  input  signal  by  considering  the  J  oversampling 
branches  as  virtual  channels.  Accordingly, 

P  OO 

*<")=£  E  $p(m)hp(n  —  m)  +  b(n)  (2) 

p—1  m=—oc 

where 
hp(n)  — 

[h l(nT)  hp(nT  —  Ts)  hj  (nT  -  (J  -  l)Ts)f 

and 
b(n)  = 

[b T(nT)  bT  (nT  —  Ts)  ■■■  bT  (nT  -  (n  -  l)Ts)f  . 

By  storing  M  symbols  (i.e.,  JM  snapshots)  of  the  data 
vectors,  we  obtain  the  following  expression 

p 

x(n)  =  ][]Hpsp(n)  +  b(n),  (3) 

p=i 

where 

x(n)  =  [xT(n)  xT(n—  1)  •••  xT(n  —  M  +  1)  ]T 


HE  = 

~hp(  0)  •••  hp(Dp)  0  .  0 

0  hp(  0)  •  •  •  hp(Dp)  0  ■  •  •  0 

.0  .  0  MO)  hp(Dp). 

sp(n)  =  [sp(n)  sp(n  —  1)  sp(n  -  M  -  Dp  +  1) ]T 

and 

b(n)  =  [feT(n)  bT(n—  1)  ■■■  bT  (n  —  M  +  1)~\T  . 

3.  SPACE-TIME  ADAPTIVE  PROCESSING 

Denote  w(n)  as  the  weight  vector  of  a  STAP  system  cor¬ 
responding  to  the  received  data  vector  x(n).  Then,  the 
output  of  the  STAP  system  becomes 

y(n)  =  ww(n)x(n).  (4) 

By  assuming  that  the  reference  signal  is  the  ideal  replica  of 
the  desired  signal,  the  error  signal  is 

e(n)  =  si(n  —  v)  —  y(n)  =  si(n  —  v)  —  wH(n)x(n),  (5) 


where  v  denotes  some  proper  time  delay  [6].  By  applying 
the  LMS  algorithm  to  above  system,  the  weight  vector  is 
updated  by 

w(n  +  1)  =  w(n) +/rox(n)e*(n)  (6) 

where  y,o  >  0  is  the  step  size.  To  ensure  convergence,  the 
step  size  (j,q  should  satisfy  [7] 

/ro  <  2/A  max  (7) 

where  Amax  is  the  maximum  eigenvalue  of  the  following 
covariance  matrix 

p 

R  =  E[xx//]  =  ^HpHf  +<t1jmn  (8) 

p=i 

and  E[-]  denotes  statistical  expectation  operator.  As  it 
is  often  a  complicated  problem  to  estimate  the  maximum 
eigenvalue,  the  step  size  is,  in  practice,  determined  by  the 
total  power  of  the  received  signal  vector,  as 

Ho(n)  =  fJ./P(n ),  (9) 

where  p  <  2  is  a  constant,  and  P(n)  =  E[xH(n)x(n)]  is  the 
power,  which  can  be  estimated  recursively  by  the  following 
equation 

P(n  +  1)  =  aP(n)  +  (1  —  a)xH(n)x(n)  (10) 
for  0  <  a  <  1. 

The  time  constants  of  the  convergence  are  given  by  [8] 

r,  =  -1— ,  1  <i<JMN  (11) 

where  A,-’s  are  the  eigenvalues  of  the  covariance  matrix  R. 
Comparing  (11)  with  (7),  it  is  well-known  that  the  eigen¬ 
value  spread  Amax/Amin  should  be  small  to  warrant  fast 
convergence.  However,  the  eigenvalue  spread  is  usually 
large  due  to  the  channel  dispersions  and  the  high  signal 
correlation  across  the  virtual  channels  [6].  Several  meth¬ 
ods,  including  self-ortliogonalizing  methods  and  subspace- 
based  methods,  have  been  proposed  to  reduce  the  eigen¬ 
value  spread  and  to  improve  the  convergence  rate  [7].  Nev¬ 
ertheless,  these  methods  are  complicated  in  the  sense  that 
they  require  either  matrix  inversion  or  eigen-decomposition 
of  a  large  covariance  matrix  R.  In  the  next  section,  we  con¬ 
sider  simpler  approaches  based  on  subband  decomposition. 

4.  SUBBAND  ARRAYS 
4.1.  Subband  Decomposition 

Subband  decomposition  is  realized  by  using  a  set  of  analysis 
and  synthesis  filters.  Discrete  Fourier  transform  (DFT)  and 
modified-QMF  filter  banks  are  examples  of  perfect  recon¬ 
structed  and  near-perfect  reconstruction  filter  banks,  respec¬ 
tively  [4].  Decimation  can  be  applied  post  the  analysis 
filters  to  reduce  the  processing  data  rate,  provided  that 
the  decimation  rate  does  not  exceed  the  number  of  sub¬ 
bands.  Decimation  often  reduces  the  steady  state  system 
performance  due  to  aliasing.  Furthrt,  perfect  reconstruc¬ 
tion  property  can  be  easily  destroyed  if  adaptive  techniques 


545 


are  employed  after  decimation  due  to  changes  in  the  alias¬ 
ing  characteristics.  In  this  paper,  no  decimation  is  applied 
for  subband  signal  components.  Also,  synthesis  filters  are 
not  considered  for  simplicity. 

Let  the  subband  decomposition  deride  the  data  sequence 
at  the  output  of  *th  virtual  channel.  i,  (n),  into  K  subband 
sequences,  x\l)(n),-  ■  ■  ,x\h)(n),  where  the  superscript  (fc) 
denotes  the  data  component  at  the  fctli  subband.  We  define 

T  Tl  ^ 

xT(n)  =  (x^V))  I 

as  the  data  vector  for  the  subband  arrays  with 


The  filter  banks  can  be  designed  to  reduce  the  cross¬ 
correlation  between  signal  components  in  adjacent  subbands. 
The  correlation  reduction  permits  the  processing  the  sub¬ 
band  signals  individually.  In  the  following,  two  subband 
array  techniques  are  proposed. 

4.2.  Normalized  LMS  Subband  Arrays 

Processing  the  signal  vector  for  a  subband  array,  xr(?i),  by 
the  weight  vector 

w  r(n)  =  (w  ^(n))  (w  ^2>(?i))  ••• 

the  output  of  the  subband  array  becomes 

yr(n)  =  w"  (rc)xr(n)  =  Wr(n)Tx(?i)  (12) 

and  the  error  signal  is  given  by 

er(n)  =  si  (n)  -  2/t(«)  =  «i  (n)  -  w ?xr(n).  (13) 

The  adaptive  subband  arrays  can  implement  the  nor¬ 
malized  LMS  algorithm,  where  the  weight  vector  is  updated 

by  -i 

wt(«  +  1)  =  w  r(n)  +  fi P  xr(n)f'r(”)i  (14) 
where  P  =  ding[Pl'k\  fc  =  1,2,---,  A']  is  a  diagonal  matrix 
with  the  signal  power  at  the  fctli  subband  P(k'1  —  E  (x^  })H 

Xy  1  as  its  fcth  diagonal  element.  is  often  estimated 

recursively  by 

P(A)(n  +  1)  =  aP{k\n)  +  (1  -  a)(x^  (n))H  x^\n).  (15) 

Unlike  the  STAP,  where  only  one  step  size  is  defined,  in 
the  subband  array,  the  equivalent  step  size 

Hw(n)  =fi/P(k\n)  (16) 

is  used  for  the  fctli  subband.  Proper  normalization  of  the 
step  size  can,  indeed,  greatly  reduce  the  eigenvalue  spread 
of  the  covariance  matrix,  specifically  in  the  case  when  the 
signal  arrivals  have  different  power  at  different  frequencies. 
The  change  in  the  signal  power  spectrum  may  be  simply 
caused  by  the  frequency  selectivity  of  the  channels.  The 
eigenvalue  spread  due  to  unflat  signal  spectrum  can  be  well 


compensated  by  adjusting  the  step  sizes,  provided  that  the 
signal  correlation  between  different  subbands  is  small. 

However,  subband  array  processing  may  still  suffer  from 
high  signal  correlations  across  the  output  of  each  virtual 
channel,  which  limit  the  convergence  improvement.  Below, 
we  propose  locally  orthogonalized  LMS  subband  arrays  for 
further  convergence  improvement. 

4.3.  Locally  Orthogonalized  LMS  Subband  Arrays 

The  locally  orthogonalized  LMS  subband  arrays  perform 
eigen-decomposition  separately  at  each  subband  and  deter¬ 
mine  the  step  sizes  based  on  their  respective  eigenvalues.  It 
is  important  to  note  that,  unlike  the  subspace-based  LMS 
subband  array  techniques,  where  the  eigen-decomposition 
of  the  covariance  matrix  R  of  size  JAIN  x  JAIN  could  be 
computationally  prohibitive,  in  the  proposed  method,  the 
matrix  at  each  subband  is  of  dimension  JN  x  JN ,  which  is 
considerably  smaller  and  more  amenable  to  fast  computa¬ 
tions. 

Let  RjA  ’  =  E[x(7k) (n)(x\k\ n))H]  denote  the  signal  cova¬ 
riance  matrix  of  x^  *  (n)  defined  at  the  fctli  subband.  Similar 
to  the  power  estimation,  R^  1  can  also  be  estimated  recur¬ 
sively  by 

R^(n  +  1)  =  0R(T](n)  +  (1  -  /?)x^)(n)(x^)(7i))w]  (17) 

where  0  <  0  <  1.  Let  A(a)  =  ding  •  •  • ,  be  the 

diagonal  eigenvalue  matrix  of  RjV(ra)  and  F(A’^  the  matrix 
with  the  corresponding  eigenvectors  as  its  columns.  The 
vector  x^(ti)  can  be  decorrelated  by  using  the  following 
orthogonal  projection, 

x‘*,(n)=  (Fw)"x<%,).  (18) 

The  weight  vector  at  the  fctli  subband,  denoted  as  w((,A\  can 
be  updated  similar  to  the  normalized  LMS  subband  array 
as 

W {k\n  +  1)  =  W ik\n)  +  /t  (A(A))  ’  x(k) {n)r'„(n) ,  (19) 


5.  SIMULATION  RESULTS 

Computer  simulations  are  performed  to  demonstrate  the 
effectiveness  of  the  proposed  subband  array  techniques  in 
terms  of  convergence  performance  improvement  over  the 
conventional  STAP  systems.  A  three-element  array  with 
uniform  circular  arrangement  is  employed.  The  interele¬ 
ment  sparing  is  \/3  wavelength,  and  the  oversampling  factor 
is  J  =  2.  One  desired  user  and  two  cochannel  interferers  are 
considered.  All  of  the  user  signals  are  modulated  by  QPSK 
with  FIR  raised-cosine  pulse  shaping  filtering,  where  the 
rolloff  factor  is  set  to  1 .0.  Six  rays  are  randomly  generated 
for  each  user  signal.  The  different  parameters  arc  listed  in 
Tables  1-3,  where  8,  (j),  r,  and  £  are,  respectively,  the  eleva¬ 
tion  angle  of  arrival  (AOA),  azimuth  AOA,  time  delay,  and 
propagation  loss.  The  input  SNR  of  the  direct  ray  is  10  dB 
for  each  user  signal. 


where  e„(n)  is  the  error  signal  at  the  system  output. 
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In  this  simulation  example,  24  taps  are  used  for  the 
STAP  system,  and  the  DFT  filter  bank  is  used  to  gen¬ 
erate  24  subband  bins  for  the  subband  arrays.  The  out¬ 
put  residual  error  power  is  shown  in  Fig.  1.  It  is  evident 
from  this  figure  that  the  power  normalization  across  the 
subbands  alone  does  not  greatly  improve  the  convergence 
performance,  whereas  the  improvement  of  the  convergence 
performance  by  the  locally  orthogonalized  subband  array  is 
evident. 

6.  CONCLUSION 

In  this  paper,  two  simple  subband  array  techniques  were 
proposed  to  perform  spatio-temporal  equalization  with  im¬ 
proved  convergence  performance  compared  with  conven¬ 
tional  STAP  techniques.  The  normalized  LMS  subband 
arrays  and  the  locally  orthogonalized  LMS  subband  arrays 
respectively  decorrelate  the  impinging  signals  in  the  tempo¬ 
ral  domain  and  the  spatio-temporal  domains,  resulting  in 
improved  convergence  performance  with  proper  adjustment 
of  the  step  sizes.  The  importance  of  decorrelating  the  signal 
arrivals  across  the  virtual  channels  was  emphasized. 
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Table  1:  Parameters  of  the  desired  signal 


No. 

6  (deg) 

cf>  (deg) 

r  (sym) 

$ 

1 

12.3 

24.6 

0 

1.0 

2 

8.3 

19.1 

0.62 

-0.18-j0.77 

3 

26.7 

6.2 

1.72 

-0.51  -  j0.59 

4 

24.0 

56.8 

5.06 

0.68  +  jO.30 

5 

9.3 

13.6 

7.78 

-0.13  -  j0.60 

6 

26.6 

37.3 

7.90 

0.33  -  j0.31 

Table  2:  Parameters  of  interferer  1 


No. 

9  (deg) 

<t>  (deg) 

r  (sym) 

4 

1 

8.6 

33.6 

0 

1.0 

2 

19.0 

53.4 

2.31 

0.85  +  j0.18 

3 

12.4 

66.6 

2.65 

2.23  -  j0.68 

4 

29.0 

49.5 

5.30 

0.62  -  j0.32 

5 

22.1 

53.5 

5.62 

-0.27  +  j0.50 

6 

26.2 

73.7 

7.55 

0.05  -  j0.56 

Table  3:  Parameters  of  interferer  2 


No. 

8  (deg) 

<t>  (deg) 

r  (sym) 

1 

6.6 

120.6 

0 

1.0 

2 

12.2 

149.8 

4.04 

0.14  +  j0.90 

3 

6.3 

135.4 

6.04 

0.05  +  j0.67 

4 

14.7 

131.7 

6.56 

-0.49  +  j0.42 

5 

17.9 

133.5 

9.65 

0.28  -  j0.38 

6 

7.0 

134.1 

11.02 

0.10  + j0.42 

Fig.  1  Comparison  of  the  residual  error  power  of  STAP 
and  subband  arrays. 
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ABSTRACT 

The  fading  encountered  in  multipath  mobile  communication  chan¬ 
nels  is  often  the  primary  cause  of  degradation  in  communication 
system  performance.  There  are  many  situations  in  mobile  commu¬ 
nications  in  which  it  would  be  advantageous  for  a  communications 
system  to  have  real  time  information  on  how  a  signal  will  fade  in 
advance  of  the  fade  actually  occurring.  This  paper  looks  at  the 
ways  in  which  this  real  time  prediction  of  the  mobile  channel  can 
be  achieved.  Physical  models  of  mobile  channels  which  allow  pre¬ 
diction  are  discussed.  Algorithms  based  on  these  models,  and  their 
performance,  are  presented.  Performance  bounds  for  model  based 
prediction  based  on  the  Cramer  Rao  bound  are  also  derived.  The 
algorithms  are  also  applied  to  measured  channel  data. 

1.  INTRODUCTION 

IN  multipath  propagation  environments,  the  signal  received  by 
an  antenna  has  a  complex  Gaussian  distribution,  providing  there 
are  many  multipath  components.  The  fading  envelope,  which  has 
a  Rayleigh  or  Rician  distribution,  is  a  dominant  feature  in  mobile 
radio  communications. 

There  are  many  situations  in  mobile  communications  in  which  it 
would  be  advantageous  for  a  communications  system  to  have  in¬ 
formation  on  how  a  signal  will  fade  in  advance  of  the  fade  actually 
occurring.  If  the  timing  of  a  fade  is  known  sufficiently  in  advance, 
there  will  be  sufficient  time  for  corrective  action  to  be  negotiated 
between  the  transmitter  and  receiver  [1,2].  This  corrective  action 
may  for  example  take  the  form  of  change  of  time-slot  (for  a  TDMA 
system),  change  of  frequency  (for  a  FDMA  system),  change  of 
power  level,  or  change  of  coding  scheme.  Several  approaches  to 
continuous  power  adjustment  have  been  suggested,  but  these  suf¬ 
fer  when  the  channel  information  available  to  the  transmitter  is 
obsolete.  The  information  will  be  obsolete  for  a  rapidly  varying 
channel.  Channel  prediction  can  overcome  this  problem.  A  simi¬ 
lar  argument  applies  to  improvement  of  transmit  antenna  diversity 
systems  and  the  Multiple-Input  Multiple-Output  (MIMO)  systems 
which  require  channel  information  at  the  transmitter. 

The  statistical  modelling  of  the  channels  is  often  taken  to  imply 
that  the  channel  variation  is  random,  i.e.,  cannot  be  predicted  more 
than  the  correlation  distance  (or  time)  of  the  changing  channel. 
However,  by  introducing  a  physical  model  for  the  multipath,  it  be¬ 
comes  possible,  in  principle,  to  apply  signal  processing  to  predict 
the  channel  over  distances  much  longer  than  the  correlation  dis¬ 
tance.  The  idea  is  to  model  the  channel,  and  use  the  samples  along 
a  known  spatial  trajectory  to  estimate  the  channel  model  parame¬ 
ters.  The  channel  model  can  then  be  used  to  extrapolate  beyond 


the  region  of  the  measurements. 

In  this  paper,  a  model  for  the  fading  channel  of  a  radio  communi¬ 
cations  signal  is  presented,  and  bounds  for  prediction  based  on  the 
model  are  established. 

2.  NARROWBAND  FADING  CHANNEL  MODEL 

The  model  primarily  used  in  this  paper  is  that  of  far-field  scatter¬ 
ed  of  a  narrowband  signal  surrounding  the  receiver.  This  model 
is  derived  and  used  for  example  in  [3,4],  The  measurement  seg¬ 
ment  becomes  in  effect  a  synthetic  array,  and  the  relative  delay  to 
each  of  the  elements  of  this  array  is  used  as  the  basis  for  localising 
the  scatterers.  It  is  assumed  that  samples  {r„,  }  of  the  the  com¬ 
plex  channel  gain  can  be  simply  derived  from  the  receiver  (i.e., 
the  data  sequence  is  known  at  the  receiver,  or  a  decision  feedback 
mechanism  is  used).  These  samples  may  be  described  using: 

N 

Er  Jt-A,  sin?,  ,,, 

< ne  x  +rj,n  (1) 

n  =  l 

where 

rm  is  the  mth  complex  channel  sample, 

N  is  the  number  of  discrete  scatterers  present, 

9n  is  the  bearing  of  the  scatterer  location 
Ax  is  the  distance  travelled  by  the  receiver  between  chan¬ 
nel  samples, 

A  is  the  signal  carrier  wavelength, 
i)m  is  a  sampled  complex  Gaussian  noise  process  of  zero 
mean,  assumed  to  be  white  (or  whitened)  with  vari¬ 
ance  so  that  E{qm ,  q’m 2}  =  er,, <5, 

C,n  is  the  complex  attenuation  of  a  far  field  scattering 
point  at  angle  9.  This  term  includes  factors  such  as 
space  loss  and  polarisation  mismatch. 

Equation  ( 1 )  can  be  seen  to  be  equivalent  to  the  model  of  complex 
sinusoids  in  noise.  The  Doppler  frequency  of  scatterer  n  is  defined 
as  ^  A,,  sin  i9„ .  The  model  can  be  expressed  in  matrix 

forms  as 

r  =  A(-E7)£  +  f7  (2) 

where 

A(tt7)  =  [a(o7i),a(tt72), . .  .  a(ro/v)] 

ajw)  =  [eJ~V~\...  >ejro(A,-1)]T, 

TU  =  [oti  ,  .  .  .  ,  tT7jv]r  r  =  [n,  .  .  .  ,  t'A/]T 

C  =  [Cl,--  -  n  =  fol,...  ,I)A/]7 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


548 


and  M  is  the  number  of  complex  samples  used  for  the  prediction. 
The  channel  samples  {rm},  called  the  measurement  segment,  are 
known.  The  problem  is  to  use  only  this  information  to  estimate  as 
accurately  as  possible  the  values  in  the  prediction  segment  {rm\M + 
1  <  m  <  M2}.  This  is  achieved  by  first  estimating  the  parame¬ 
ters  of  the  model  N,  £,  and  zu,  and  using  the  estimated  values  to 
extrapolate  the  complex  sinusoids. 

The  minimum  description  length  (MDL)  criterion  for  estimation  of 
the  “model  order”  N  is  known  to  be  both  unbiased  and  consistent. 
The  version  of  MDL  presented  in  [5]  is  used  in  this  paper. 

The  maximum  likelihood  estimates  of  the  remaining  parameters 
can  be  shown  to  be  £  =  A(iu)+r  where 

??  =  argmax  rHA(tcr)A(r^)+r  (3) 

ZU 

and  the  superscript  +  represents  the  Moore-Penrose  generalised 
inverse.  The  maximum  likelihood  solution  requires  an  N  dimen¬ 
sional  search  which  is  generally  impractical.  Subspace  methods 
of  estimating  va  have  been  shown  (e.g.,  see  references  in  [6])  to 
achieve  results  close  the  Cramer  Rao  bound  provided  the  signal- 
to-noise  ratio  (SNR)  is  adequate,  and  these  have  been  used  in  this 
paper. 


3.  CRAMER  RAO  HOLM) 


3.1.  Bound  Formulation 

From  equation  (2),  it  can  be  seen  that  r  is  a  complex  Gaussian 
random  vector  with  mean 


N  N 

»  =  =  E^eJ'~nm  (4) 

n— 1  n=l 


and  variance  C  =  cr\ I.  Calculation  of  the  bound  is  facilitated 
by  expressing  the  attenuation  as  the  real  parameters  amplitude  and 
phase;  thus  („  =  cne^n ,  so  the  parameter  vector  £  is  (?i 
. . . ,  Cat,  ipN,  ttJN)T.  Using  Bangs’  formula  [7],  the  elements  of 
the  Fisher  information  matrix  are  given  by 


[J(0]y  = 


c-^c- 


i  dC 
d(,j 


4”  2  Re 


dp"  !  dp' 


3£i 


d£j 


Differentiating  with  respect  to  each  of  the  parameters,  the  Fisher 
information  matrix  can  be  shown  to  consist  of  3  x  3  blocks  of  the 
form: 


(5) 


where  rn  and  712  are  the  row  and  column  indices  for  the  3  x  3 
blocks,  and  0  represents  element  by  element  multiplication. 

To  obtain  an  estimate  of  typical  prediction  performance,  this  Fisher 
Information  Matrix  is  calculated  using  parameters  taken  from  the 
following  distributions:  the  complex  amplitudes  £„  are  indepen¬ 
dent  identically  distributed  (iid)  zero-mean  complex  Gaussian  vari¬ 
ables  with  variance  of  1,  and  6n  are  iid  uniform  over  (0f ,  f  ]  with 
0  <  <  7 r. 


The  error  between  the  actual  and  predicted  complex  envelope  is 


kMI  = 


E 


Y,^ejinejAnm 


(6) 


Taking  a  first  order  approximation  for  e[m]  and  defining 
and  as  the  difference  between  the  actual  and  estimated  values 
of  each  of  the  symbols. 


e[m]  K 

71  =  1 

Taking  the  expectation 


dr[m]  ,  dr[m}_  ,  dr[m]  _ 

O  i”  o  /  eV’n  A  twn  •  \/) 

OSn  Olpn  OWn 


£{leHI2}«E 

71 1  =1  712  =  1 


where  hnin2  is  the  sum  of  the  elements  of  the  matrix 


©  (J-1)  (8) 

V  /7ll7l2  X  ' 

and  [J  ~ 1  ]  is  the  3  x  3  block  of  the  Fisher  inverse  with  m  and 

no  being  the  row  and  column  indices  for  the  3  x  3  blocks.  This 
assumes  of  course  that  the  bound  is  nearly  achieved  so  that  E{((,  — 
£)(£—£) 77 }  —  J-1  is  not  merely  a  positive  semi-definite  matrix  [8, 
p82]  as  required  by  the  Cramer-Rao  inequality,  but  actually  close 
to  the  zero  matrix. 

It  is  immediately  apparent  that  for  a  large  prediction  range  (large 
m),  the  most  critical  parameter  is  the  frequency  w,  since  the  vari¬ 
ance  of  this  estimate  is  multiplied  by  m2  to  calculate  the  overall 
prediction  error. 

3.2.  Invertibility  of  the  Fisher  Information  Matrix 

A  simulation  of  the  scenario  described  above  lead  to  the  discov¬ 
ery  that  a  large  proportion  of  the  Fisher  information  matrices  were 
very  poorly  conditioned.  It  is  conjectured  in  [9]  that  the  Fisher 
information  matrix  is  singular  only  if  two  or  more  of  the  tone 
frequencies  are  equal,  modulo  2t ,  assuming  M  is  large  enough. 
Conditions  affecting  the  likelihood  of  the  Fisher  information  ma¬ 
trix  being  close  to  singular  are  the  number  of  scatterers  present  N, 
the  number  of  elements  in  the  virtual  array  M,  and  the  length  of 
the  virtual  array  (M  —  1)AT. 

The  number  of  scatterers  for  which  the  parameters  may  be  reliably 
estimated  for  a  given  measurement  segment  length  was  investi¬ 
gated.  The  number  of  scatterers  in  a  given  scenario  was  reduced 
by  eliminating  scatterers  for  which  t*7ni  was  close  to  that  of  an¬ 
other  scatterer  zjn2 .  until  the  UNPACK  reciprocal  condition  esti¬ 
mate  [10]  was  larger  than  10~15.  The  results  are  shown  in  figure  1. 
If  the  actual  number  of  scatterers  N'  is  larger  than  the  number  N 
presented  in  figure  1,  then  the  parameters  of  some  of  the  scatterers 
cannot  be  estimated,  even  though  two  scatterers  which  are  very 
close  in  Doppler  frequency  may  be  effectively  modelled  as  one 
scatterer. 

The  power  of  the  scatterers  not  parametrised  has  two  effects.  First 
it  decreases  the  overall  effective  SNR,  since  some  of  the  scatterers 
effectively  become  noise,  and  secondly  it  causes  an  increase  in  the 
error  between  the  predicted  and  actual  signal. 


j  ?ri2 

<*n2 

jrn 


^nl 
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Figure  1:  The  number  of  scatterers  which  can  be  modelled  as  a 
function  of  the  measurement  segment  length,  using  invertibility  of 
the  Fisher  Information  Matrix  as  criterion.  The  shaded  region  is 
that  for  which  the  number  of  real  parameters  to  be  estimated  is 
less  than  or  equal  to  the  number  of  real  measurements  available. 

If  ei[m]  is  the  error  that  would  be  expected  in  the  case  where 
N  =  N'  and  for  unity  SNR,  and  P  is  the  power  of  the  “ignored” 
scatterers  where  the  power  of  all  scatterers  is  unity,  the  expected 
error  now  becomes 

£{|e[m]|2}  =  (af1  +  P)E(\el[m]\2)  +  P.  (9) 

3.3.  Performance  Measures 

In  the  simulations  of  section  4  the  performance  criterion  is  the  dis¬ 
tance  (in  wavelengths)  for  which  the  predicted  and  actual  signal 
envelopes  differ  by  less  than  20%  of  the  root  mean  square  (RMS) 
value  of  the  envelope  in  the  measurement  segment.  As  can  be  seen 
in  the  example  of  prediction  behaviour  presented  in  figure  2.  this 
may  be  a  pessimistic  criterion  in  many  occasions,  since  the  er¬ 
ror  even  in  the  measurement  segment  may  exceed  20%,  especially 
when  the  SNR  is  small.  The  performance  measure  used  in  [1] 
(distance  for  which  the  predicted  and  actual  signal  envelopes  are 
within  5%  of  the  maximum  amplitude  value  in  the  measurement 
segment)  was  used  for  very  high  SNR,  and  is  not  practical  for  the 
SNR  values  considered  here. 

The  performance  measure  used  in  evaluating  the  CRLB  was  slightly 
different.  Equations  8  and  9  yield  an  expected  square  error  E{  |e  |2  } . 
Prediction  in  this  case  was  said  to  be  valid  to  the  point  where  this 
error  exceeded  0.04.  Where  there  are  more  than  about  10  scatterers 
present,  those  that  are  modelled  as  noise  result  in  the  mean  square 
error  being  always  larger  than  this  number,  even  thought  predic¬ 
tion  will  not  necessarily  always  fail.  For  N  <  10  and  for  long 
measurement  distances,  however  there  is  reasonable  agreement  be¬ 
tween  the  bound  and  the  simulations,  as  shown  by  comparison  of 
figure  3  and  the  simulations  of  figure  4. 

4.  SIMULATION 

Simulations  were  used  to  provide  an  indication  of  the  performance 
of  the  subspace  algorithms  when  used  to  enable  prediction  using 
the  far-held  discrete  scatterers  model.  The  SNR  was  20  dB  with 
expected  signal  power  of  unity,  with  M  =  40,  and  N  chosen  by 


Figure  2:  Example  of  Prediction  Behaviour.  Only  the  measured 
data  to  the  left  of  the  first  vertical  line  is  used  for  prediction.  The 
region  between  the  two  vertical  lines  is  where  the  predicted  enve¬ 
lope  is  within  20%  of  the  actual  envelope.  The  RMS  level  of  the 
measured  envelope  is  1,  so  prediction  is  said  to  “fail”  when  the 
error  first  exceeds  0.2 

the  MDL  criterion.  The  number  of  independent  scenarios  used  to 
find  the  mean  performance  presented  in  the  graphs  was  3000. 

In  figure  4  (unmarked  lines)  the  prediction  performance  is  pre¬ 
sented  as  a  function  of  the  length  (in  wavelengths)  of  the  mea¬ 
surement  segment.  A  selection  of  subspace  methods  was  used 
(MUSIC,  ESPRIT,  PCLP  and  Minimum  Norm),  and  whichever 
of  these  yielded  the  lowest  Mean  Square  Error  (MSE)  of  the  pre¬ 
dicted  channel  in  the  measurement  segment  was  selected  for  each 
scenario. 

Prediction  beyond  about  0.2A  is  not  achieved  until  the  length  of  the 
measurement  segment  exceeds  2A  -  even  when  the  actual  scatterer 
locations  are  known  perfectly. 

For  longer  measurement  segments  significant  prediction  perfor¬ 
mance  was  obtained,  the  prediction  range  rising  rapidly  (between 
1.2  and  1.7th  power)  with  measurement  segment  length.  Where 
the  number  of  scatterers  is  very  large  however,  the  prediction  per¬ 
formance  actually  decreases  when  the  measurement  segment  length 
increases  above  about  1A.  This  is  actually  an  artefact  of  the  spac¬ 
ing  between  samples  increasing.  The  spacing  is  the  same  in  the 
prediction  segment  as  in  the  measurement  segment  and  so  the  pre¬ 
diction  length  is  observed  as  zero  if  it  is  less  than  the  inter-sample 
distance. 

A  practical  limitation  to  long  range  prediction  appears  to  be  the 
number  of  significant  scatterers  being  large,  which  means  that  a 
long  measurement  segment  is  required.  In  practice,  long  mea¬ 
surement  segments  become  impractical  because  it  is  unlikely  that 
any  physical  scenario  will  remain  static  over  trajectories  of  several 
wavelengths,  except  perhaps  at  very  high  carrier  frequencies. 

5.  MEASUREMENTS 

The  prediction  algorithms  were  applied  to  measured  data  as  well 
as  to  synthesised  data.  The  measurements  were  taken  at  several 
frequencies,  mostly  1.92  and  5.9  GHz.  There  were  three  different 
locations  used  -  inside  a  laboratory,  outdoors,  and  inside  a  large 
workshop. 
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Prediction  Length  -E(|e|2)  <  0.04 


Figure  3:  Prediction  range  based  on  CRLB 

The  subspace  algorithms  were  applied  to  a  series  of  data  points 
measured  in  the  laboratory  (an  irregularly  shaped  room  of  dimen¬ 
sions  approximately  19  x  10  x  3  metres)  at  1.92  GHz.  This  ex¬ 
ample  is  typical  of  measurements  made  in  this  and  the  other  two 
locations.  The  channel  was  measured  at  999  points  spaced  evenly 
over  a  3  metre  distance.  Averaging  of  50  scenarios  was  obtained 
by  starting  the  measurement  segment  at  different  points  in  the  data 
set,  and  by  using  different  but  relatively  close  frequencies.  The 
likelihood  of  each  frequency  estimate  tb  was  increased  by  using  a 
gradient  method  before  estimating  the  amplitudes  £. 

The  results  are  shown  in  the  triangle-marked  lines  of  figure  4,  with 
upper  and  lower  95%  confidence  limits.  It  is  obvious  that  predic¬ 
tion  does  not  improve  significantly  with  increasing  measurement 
segment  length.  There  are  two  likely  explanations  of  this.  The 
first  is  that  the  scenario  is  not  sufficiently  static.  The  scatterers 
may  be  too  close  to  the  receiver  for  the  far  field  model  to  be  valid, 
or  the  scatterers  may  themselves  move.  The  second  is  that  the 
model  is  valid,  but  the  number  of  scatterers  is  large.  As  the  simu¬ 
lations  show,  if  the  scatterers  are  many,  prediction  is  very  limited. 
It  would  be  expected  that  if  there  are  many  scatterers,  the  field 
becomes  equivalent  to  a  diffuse  field,  and  prediction  is  truly  con¬ 
fined  to  the  correlation  distance.  If  the  scatterers  are  uniformly 
distributed  in  angle,  this  correlation  distance  is  approximately  | 
[e.g.,3]. 

6.  CONCLUSIONS 

A  channel  model  based  on  narrow-band,  far-field  discrete  scatter¬ 
ers  has  been  presented.  This  model  in  principle  allows  long  term 
prediction  of  channel  behaviour  based  only  on  a  priori  samples  of 
the  channel. 

A  form  of  the  CRLB  for  prediction  error  has  been  derived  and 
applied,  and  the  performance  has  also  been  evaluated  on  simulated 
channels. 

It  is  widely  assumed  that  the  number  of  significant  scatterers  is 
small  [11],  This  assumption  has  been  found  to  be  critical  to  the  vi¬ 
ability  of  long  range  prediction.  Several  statistical  measures  of  the 
channel  are  the  same  for  a  small  or  a  large  number  of  scatterers. 
However,  the  former  can  be  parametrised  to  obtain  long-range  pre¬ 
diction,  whereas  the  latter  cannot.  Evidence  to  date  suggests  that 
the  number  of  significant  scatterers  in  many  situations  is  large. 


Figure  4:  Prediction  Performance  versus  Measurement  Segment 
Length  for  both  simulated  data  (points  unmarked)  and  measured 
data  (points  marked  with  triangles)  using  subspace  parameter  esti¬ 
mation. 
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ABSTRACT 

The  filters  constituting  the  minimum  mean  square  er¬ 
ror  decision- feedback  equalizer  (MMSE-DFE),  as  well 
as  related  performance  measures,  can  be  computed  by 
assuming  perfect  knowledge  of  the  channel  impulse  re¬ 
sponse  and  the  input  and  noise  second-order  statistics 
(SOS).  In  practice,  we  estimate  the  unknown  channel 
and  SOS,  and  inevitable  estimation  errors  arise.  We 
model  estimation  errors  as  small  perturbations,  i.e.,  of 
order  e,  with  e.  a  sufficiently  small  positive  number,  and 
we  study  the  behavior  of  the  MMSE-DFE  under  mis¬ 
match  by  performing  a  first-order  perturbation  analy¬ 
sis.  We  prove  that  the  excess  MSE  induced  by  0(e) 
estimation  errors  is  0(e2),  uncovering  important  ro¬ 
bustness  properties  associated  with  the  MMSE-DFE. 

1.  INTRODUCTION 


Figure  1:  Channel  model. 


We  model  the  channel  impulse  response  and  SOS  es¬ 
timation  errors  as  perturbations  of  order  e,  with  e  being 
a  small  positive  number,  and  we  study  the  behavior  of 
the  MMSE-DFE  under  mismatch  by  using  a  first-order 
perturbation  analysis.  We  show  that  the  excess  mean 
square  error  induced  by  0(e)  errors  is  0(e2).  Simula¬ 
tions  show  that  the  range  of  e  for  which  our  first-order 
analysis  remains  valid  depends  on  the  SNR. 


The  finite-length  minimum  mean  square  error  decision- 
feedback  equalizer  (MMSE-DFE)  has  proved  to  be  an 
efficient  structure  toward  ISI  mitigation  in  packet-based 
communication  systems.  The  MMSE-DFE  is  deter¬ 
mined  by  the  feedforward  and  the  feedback  filter,  which 
can  be  computed  by  assuming  perfect  knowledge  of  the 
channel  impulse  response  and  the  input  and  additive 
channel  noise  second-order  statistics  (SOS)  [1]. 

In  practice,  the  channel  impulse  response  and  the 
input  and  noise  SOS  are  unknown  and  we  estimate 
them  either  by  training  or  blindly.  Thus,  inevitable 
estimation  errors  arise.  The  robustness  of  the  MMSE- 
DFE  with  respect  to  mismatch  was  first  considered  in 
[2],  where  the  authors  developed  closed-form  expres¬ 
sions  for  the  “perturbed”  MMSE-DFE  filters  and  the 
corresponding  performance  measures.  However,  for  the 
evaluation  of  these  expressions  they  have  to  resort  to 
computer  simulations.  Consequently,  we  feel  that  the 
analysis  of  [2]  does  not  provide  much  analytical  insight 
into  the  behavior  of  the  MMSE-DFE  under  mismatch. 


This  work  was  supported  by  the  EPETII  Program  of  the 
Greek  Secretariat  for  Research  and  Technology. 


2.  FINITE-LENGTH  MMSE-DFE 


2.1.  Channel  Model 


We  consider  the  baseband  discrete-time  fractionally  sam¬ 
pled  noisy  communication  channel  modeled  by  the  v- 
th  order  1-input ,/p-output.  linear  time-invariant,  system 
depicted  in  Fig.  1.  Its  input-output  relation  is 


V 

yn  =  +n„, 

i=0 


where  xn  denotes  the  input  sequence  and  the  px  1  vec¬ 
tors  y„,  n„  and  h,  denote,  respectively,  the  terms  of 
the  output,  noise  and  channel  finite  impulse  response 
sequences.  We  define  the  impulse  response  parameter 
vector  'Hv  ==  [  -  •  •  hrJ  ]  ,  where  superscript  H  de¬ 

notes  Hermitian  transpose.  The  data  vector 


A 

yn:n-iV;  +  l  — 


y  n  —  Nf  +  1 


H 


can  be  expressed  as 


Yn  :  n  —  Nf  + 1  —  H  xn  :  n  —  N/  —  i/+l  T  n n  :  n— JV/  +  ] 
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Figure  2:  Finite-length  DFE. 


where 

Rxx  =  £[xxff],  Rxy  =  £ [xyff]  =  R^H77  =  r£ 

Ryy  =  £[yyH]  =  HR„Hh  +  Rnn,  Rnn  =  £ [nnff]. 
At  the  optimal  settings,  £  [e„yH]  =  OixPNf,  yielding 


■^-yxb  —  Rj/yW  ■ 


W  =  Ryy  RyXb. 


where  the  pNf  x  (v  +  Nf)  matrix  H  is  defined  by 
hg  .  h„ 


H 


h0 


K 


Substituting  the  above  expression  for  w  into  (1),  we 
obtain  MSE  =  bflRb,  where 


R  * —  RxX  RxyRyy  Rj/X* 


(2) 


If  we  define 


and  the  definitions  of  x„ :  n-^'f-u+i  and  n„ :  n~Nf+ l  are 
obvious. 

2.2.  Finite-length  MMSE-DFE 

Our  aim  is  to  recover  (a  delayed  version  of)  the  in¬ 
put  sequence  xn  by  passing  the  noisy  output  data  y„ 
through  the  finite-length  DFE  depicted  in  Fig.  2.  The 
DFE  is  determined  by  the  following  parameter  vectors: 

1.  w  =  Wq  ■■■  w^_j  ,  which  denotes  the  p- 
input/ 1-output  length- Nf  feedforward  filter; 

2.  b  =  [  1  bj  •  ■  ■  b*Nb]H ,  which  determines  the  single¬ 
input/single-output  length- Ni,  strictly  causal  feed¬ 
back  filter.  The  settings  of  the  feedback  filter 
are  {-bl,...,-b*Nh},  where  superscript  *  denotes 
complex-  conj  ugate . 

Assuming  that  the  past  decisions  are  correct  and  con¬ 
sidering  delay  A,  the  error  between  the  desired  output 
and  the  input  to  the  decision  device  V  is  given  by  [1] 

/Nf- 1  Nb 

en  =  xn-A  -  wfy „_i  -Y]b*ixn- A-i 

\  i=  0  i- 1 

=  bHXn_A:n-A -Nb  ~  WHy„  :  n-N,+\ 

—  b  Xyj ;  n— TV/  — ^  yn:n-Nf-\-l 


Ra  =  [0(jvb+l)xA  iNb+l  0 


(Nb+ l)x, 


t]  R 


0Ax(JVb+l) 

IjVb+1 

0sx(jvt+i)  . 


where  Ij  denotes  the  i  x  i  identity  matrix,  then  the 
MSE  is  expressed  as  MSE  =  b^RAb  and  it  can  be 
shown  that  it  is  minimized  for  [1] 


b0 


Ra1  eo 
e?Ri  e0’ 


w„  =  r; 


Ryx  b0 


(3) 


where  eo  is  the  vector  with  1  at  the  first  position  and 
zeros  elsewhere.  The  minimum  MSE  is  given  by 


MMSE  =  b?  R  b0  =  bf  RAb0  =  *  .  (4) 

ep  R^  ep 


3.  MMSE-DFE:  PERFORMANCE 
ANALYSIS  UNDER  MISMATCH 

3.1.  The  framework 

Let  us  assume  that  an  estimation  procedure  has  fur¬ 
nished  the  estimates  {hj}‘'_0,  leading  to  the  impulse 
response  vector  estimate  Ho  =  [  •  •  •  hi7  ]H .  We 

consider  the  case  0  <  u.  In  order  to  compare  the  un¬ 
equal  length  vectors  Hv  and  Ho,  we  augment  Ho  with 
leading  and  trailing  zeros,  obtaining 


where  we  have  defined  b  =  [  Oixa  bH  0ixs  ]H ,  with 

OjXj  denoting  the  i  x  j  zero  matrix  and  s  =  Nf+v— A— 
Nf,  —  1.  We  simplify  notation  by  omitting  the  subscripts 
from  xn :  — 1/+1 ,  yn:n— Nf+i  and  nn:n_jv/+i. 

The  MMSE-DFE  settings  are  computed  by  mini¬ 
mizing  the  mean  square  error  (MSE)  £  [ \en\ 2] 

MSE  =  £  j^bKx  -  wHy^  (xHh  -  y^w^j 

=  hHRxxh-bHRxyw-wHRyXh+wHRyyw  (1) 


njm\  A 

nv  — 


[0l 


xpm  i 


Ho  0 


1  Xp(u— if— mi) 


whose  length  equals  the  length  of  Hv  ■  Then,  we  define 
=  arg  minTOl  \\H„  -  ft™1  H2,  where  ||  •  H2  denotes, 
depending  on  the  argument,  the  matrix  or  vector  2- 
norm.  That  is,  pm\  is  the  number  of  leading  zeros 
we  must  insert  in  front  of  Ho,  so  that  the  augmented 
impulse  response  vector  estimate  becomes  closest  to 
Hv.  In  the  sequel,  we  shall  work  with  the  augmented 
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impulse  response  vector  estimate  ' .  We  note 

that  working  with  %„  instead  of  fto  amounts  simply 
to  insertion  of  an  extra  delay  of  m\  time  units. 

We  consider  our  channel  estimate  as  being  good  if 
H,,  and  are  close  to  each  other.  In  terms  of  the 
associated  filtering  matrices,  we  express  this  condition 
as: 

AH  =  H-H,  ||AH||-2  <  e 

where  e  is  a  small  positive  number.  The  estimation 
errors  in  the  input  and  noise  SOS  can  be  expressed  in 
an  analogous  manner,  that  is 


where 


1Z  =  Rxx 


RxyRyy  Ryi  R-XyRyy  R»I 
+ RxyRyy  RyyRyy  Ryx  • 


(7) 


Our  aim  is  to  assess  the  excess  MSE,  MMSE-MMSE, 
introduced  by  the  channel  and  SOS  estimation  errors. 
To  that  end,  we  first  relate  1Z  and  R. 

Theorem  1 :  Matrices  1Z  and  R  satisfy 

K  =  R  +  0(e2).  (8) 


ARXX  —  Rxx  Rxx,  ||ARXX||2  <  t 

ARnn  =  Rnn  Rnn,  ||  AR7in|j2  <  6. 

Different  quantities  may  be  known  with  different  ac¬ 
curacies,  i.e.,  the  e’s  in  the  above  expressions  may  be 
different.  In  this  case,  e  is  the  biggest  of  these  values. 


3.2.  MMSE-DFE:  Perturbation  analysis 

Under  mismatch,  efforts  toward  computation  of  R;/v, 
Ryz  and  Rxy  lead  to: 


Ryy  =  +  Rnn,  Rry  =  RtjH"  =  R"  . 

Then,  efforts  toward  computing  R  and  Ra  give: 

R  =  Rxx  —  RxyRyy*  Ryx 


r-i  A  r  t 

Ra  =  [0(^6+1)xa  LVt+1  0(/V6+l)> 


R 


°Ax(AU1) 
JjVfc  +  l 
Osx(Afi,+l) 


The  resulting  “optimal”  filters  are  given  by 
C  _  R-a1  eo 


eo  Ra1  e0  ’ 


w0  =  R  '  R,,,  b 


yy 


* yx  ^ o 


(5) 


where  b0  is  the  appropriately  zero-padded  version  of 
b0  (recall  the  definition  of  b  in  terms  of  b). 

Assuming  correct  past  decisions,  the  corresponding 
MSE  can  be  expressed  as 


MMSE  =  £ 


-h 


b0  x  —  w^y  (  xHb0  —  yHw0 


~H  ~  ~H  ~ 

b0  Rxxbo  b0  RXyW0  w0  Ryx  bQ 

+W^RyyW0. 


Substituting  into  the  above  equation  the  expression  for 
w„  given  in  (5),  we  obtain 


MMSE  =  b0  7ib0  (6) 


The  proof,  which  is  can  be  constructed  by  using  first- 
order  perturbation  expansions,  can  be  found  in  [3]. 

We  can  now  relate  MMSE  and  MMSE. 
Theorem  2:  Quantities  MMSE  and  MMSE  sat¬ 
isfy 

MMSE  =  MMSE  +  0{c2). 

Proof:  We  define  Ab0  =  b0  -  b0.  Using  (6),  (8)  and 
(4)  and  ignoring  higher-order  error  terms,  we  obtain 


___  ~/r  ~  ~ 

MMSE  =  b0  7Zb0  =  b0  Rb0  =  bfRAb0 
=  (b0H  +  Ab")  Ra  (b0  +  Ab0) 

=  MMSE  +  (Ab" RAb0  +  bf  RAAb0) . 

From  the  definition  (3)  of  bo,  we  obtain  that  the  vec¬ 
tor  RAb0  is  a  multiple  of  e0.  By  construction,  the  first, 
element  of  Ab0  is  identically  zero,  since  the  first  ele¬ 
ment  of  both  b0  and  b0  is  1.  Thus,  the  terms  inside 
the  parenthesis  vanish,  to  prove  theorem  2.  □ 

Theorem  2  says  that  the  MMSE-DFE  is  very  robust 
with  respect  to  small  channel  and  SOS  mismatch. 


4.  SIMULATIONS 


In  our  simulations,  we  use  the  communication  channel 
whose  impulse  response  is  plotted  in  Fig.  3.  It  models 
a  multipath  scenario,  and  is  derived  by  oversampling, 
by  a  factor  of  2,  the  continuous-time  channel  impulse 
response  h(t)  =  p(t )  —  0.4 p(t  —  0.7),  where  p(t)  is  the 
truncated  raised-cosine  pulse,  with  roll-off  factor  ft  = 
0.22.  The  truncation  interval  is  [— 3T,  3T],  where  T 
denotes  the  symbol  period,  and  the  sampling  instants 
are  the  integer  multiples  of  T /2.  The  input  is  a  BPSK 
signal,  taking,  with  equal  probability,  the  values  ±1, 
yielding  Rxx  =  I.  At  the  multi-channel  output,  we 
add  temporally  and  spatially  white  noise  with  variance 
a2.  Hence,  Rnn  =  a2 1.  We  define  the  SNR  as: 


SNR  =  10  log10 


£[\\*n\\l] 
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Figure  3:  Fractionally  sampled  channel  impulse  response. 


Figure  4:  Excess  MSE  (’*-’)  versus  e,  for  SNR=12dB;  up¬ 
per  and  lower  lines:  functions  e  and  e2,  respectively. 

where  wn  is  the  noiseless  channel  output  at  time  n.  Us¬ 
ing  the  knowledge  of  the  channel  impulse  response  and 
the  input  and  noise  SOS,  we  compute  the  optimal  filters 
b0  and  w0,  as  well  as  the  MMSE,  for  Nf  =  8,  Nb  =  4 
and  all  possible  delays.  In  order  to  relate  the  range  of  e, 
for  which  our  first-order  analysis  remains  valid,  to  the 
size  of  the  unperturbed  quantities,  we  perturb  {h,}®=0, 
Rxx  and  Rnn  using  random  perturbations,  such  that: 

max(||AH||2,  ||ARXX||2,  ||ARnn||2)  =  e. 

The  sizes  of  the  unperturbed  quantities  are:  ||H jj2  = 
1.4580,  ||R*X||2  =  1  and  ||R„„||2  =  <r2.  Using  inaccu¬ 
rate  data,  we  compute  b0,  wc  and  MMSE.  In  Fig. 
4,  we  plot  the  excess  MSE  for  (a  typical)  delay  A  =  3 
and  SNR=12dB  (<j2  «  0.049),  versus  e.  In  the  same 
figure,  we  plot  the  functions  e  and  e2  (upper  and  lower 
line,  respectively).  We  observe  that  for  e  6  [0,  e*),  with 
e*  w  .0546,  the  excess  MSE  shows  a  quadratic  depen¬ 
dence  on  e.  That  is,  in  this  range,  our  first-order  anal¬ 
ysis  is  valid  and  the  excess  MSE  is  remarkably  small. 

In  Fig.  5,  we  depict  the  corresponding  plots  for 
SNR=25dB  (cr2  «  0.0025).  It  is  clear  that,  now,  our 
analysis  is  valid  for  a  smaller  range  of  e,  i.e.,  e  €  [0,  e*), 
with  e*  «  0.0026.  We  observe  that,  in  both  cases,  our 


Figure  5:  Excess  MSE  (’*-’)  versus  e,  for  SNR=25dB;  up¬ 
per  and  lower  lines:  functions  e  and  e2,  respectively. 

analysis  remains  valid  for  e  6  [0, e*),  with  e*  «  cr2. 
Thus,  for  fixed  H  and  Rxx,  the  range  of  e  for  which 
our  analysis  remains  valid  decreases  for  increasing  the 
SNR. 

In  order  to  isolate  the  effects  of  the  estimation  errors 
in  each  one  of  the  quantites  of  interest,  i.e.,  {hj}®=0, 
Rxx  and  Rnn,  we  performed  experiments  where  we  per¬ 
turbed  only  one  quantity  at  a  time  [3].  We  observed 
that  for  e  6  [0,  e*),  with  e*  «  0.3,  the  MMSE-DFE 
was  insensitive  to  estimation  errors  in  H  and  Rxx,  be¬ 
cause  the  excess  MSE  induced  by  size-e  perturbations 
on  these  quantities  was  very  close  to,  and  in  many  cases 
smaller  than,  e2.  We  observed  that  this  happened  irre¬ 
spective  of  the  SNR.  On  the  other  hand,  the  MMSE- 
DFE  was  more  sensitive  to  errors  occuring  in  R„„,  es¬ 
pecially  at  high  SNR.  This  was  to  be  expected  since  for 
fixed  ||H||2  and  ||RXX||2,  ||R„„||2  becomes  smaller  for 
increasing  the  SNR.  Hence,  the  size  of  the  perturba¬ 
tions  l|AR„„||2  tolerated  by  a  first-order  analysis  de¬ 
creases  as  well. 
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Abstract:  A  frequency  domain  approach  for  evaluating 
estimation  bounds  (Cramer-Rao  bounds)  of  a  base-band 
communication  channel  mode I  parameters  is  presented.  It  is 
assumed  that  the  estimation  process  uses  2"d  order  statistics  of 
an  over-sampled  signal  with  the  exploitation  of  signal  cyclo¬ 
stationary  properties.  The  described  frequency  domain  approach 
provides  useful  insight  into  the  channel  estimation  problem  and 
is  independent  of  the  adopted  parameter  estimation  technique. 
Furthermore,  the  proposed  frequency  domain  technique  does  not 
depend  on  the  noise  probability  density  function  and  could  be 
easily  extended  for  evaluating  the  estimation  performance  in  the 
presence  of  colored  noise. 


1.  Introduction 

The  inter-symbol  interference  resulting  due  to  the  presence  of 
multi-path  in  digital  communication  channels  needs  to  be 
equalized  for  successful  data  transfer.  Classically,  equalization  is 
achieved  using  a  training  sequence  but  this  has  the  drawback  of 
reducing  the  data  rate.  The  increasing  interest  in  digital  mobile 
communication  and  digital  broadcasting  require  high  data  rates 
and  thus  blind  equalization  techniques  are  preferred  over  the  use 
of  training  sequences.  Blind  channel  equalization  could  be 
achieved  via  adaptive  algorithms,  however,  they  result  in  very' 
slow  convergence  rates.  In  overcoming  these  difficulties,  fast 
blind  channel  estimation  and  equalization  has  been  proposed 
which  are  obtained  using  the  second-order  statistics  of  the 
process  [i][2]. 

Over  sampling  the  signal  above  the  Nyquist  rate  and  using 
cyclo-stationary  properties  of  the  process,  various  second-order 
statistics  based  channel  estimation  algorithms  have  been 
described  in  literature  [4],  Attempts  have  also  been  made  to 
obtain  the  bounds  of  such  estimation  methods.  Most  of  these 
bounds  are  obtained  using  time  domain  techniques  that  are 
tedious  and  also  depend  on  the  method  used  for  estimation.  This 
paper  proposes  to  evaluate  the  estimation  bounds  using  a 
frequency  domain  approach  and  describes  the  procedure  in 
detail.  A  simulation  example  is  shown  to  justify  the  assumptions 
used  in  deriving  the  bounds. 

2.  Channel  Model 

Consider  a  baseband  communication  system  with  the  following 
channel  model. 

x(t)=  jr h(a)u(t-cc)  +  n(t )  ,  (1) 

a=~oo 

where  t  takes  discrete  values  and  it  is  assumed  that  the 
sampling  rate  is  normalized  to  1;  h(  )  is  the  channel  impulse 
response;  and  n(t)  is  an  additive  noise  process.  In  equation  (1), 


the  information  symbols,  sk ,  that  are  transmitted  at  T  intervals 
are  given  by 

u(t)=  sk8(t-kT )  .  (2) 

k=—oo 

Equation  (1)  can  be  equivalently  expressed  in  the  frequency 
domain  as 

X(co)  =  H(a))U(a»  +  N(co)  .  (3) 

In  the  following  discussion  we  will  demonstrate  how  the  channel 
frequency  response  H(a>)  can  be  estimated  using  X(co). 

Assume  that  XUo)  is  estimated  from  the  received  signal  x(t) 
via  the  use  of  DFT  of  length  L  (selected  as  an  integer  multiple 
of  T  ).  That  is  X (ft))  is  estimated  at  L  distinct  frequency 
points  in  the  frequency  interval  of  (0-  2lt)  given  as 
X(k)  =  X{a»\w_  =H(k)U(k)  +  N(k)  0<k<L-l  (4) 

We  will  now  examine  some  correlation  properties  of  X(k)  .  In 
order  to  do  this  consider  the  correlation  properties  of  U(k) . 

3.  Correlation  Properties  of  U  (k) 

U (k)  is  given  by 

U(k)^u(n)e-J"k2""-  .  (5) 

;;=0 

Consider  the  product  |(/(k|  ){/'(£,)],  and  evaluate  its  statistical 
mean  as, 

E\u(k,  )U'(k2 )]=  (6) 

n=0  m=0 

By  the  substitution  for  u(n)  from  equation  (2)  and  using  cyclo- 
stationary'  properties  we  get 

E[<(n)«*(m)]=£[r„.t*,]yl  ^(f-IT)  =  g(w-/»)  8(t-kT )  (7) 

k=—°°  k=—oo 

Substituting  this  in  equation  (6)  we  get 

£^/(k,)f/*(k2)]=]£  '^8(n-kT)e~j”a'~k-)2'r/L  .  (8) 

H=0  k—-<x> 

Equation  (8)  can  be  simplified  to  obtain 

£^/(kl)£*(A-2)]=  ^e-j"<k  1-kAMT/L  _  (9) 

n= 0 

Note  that  the  right  hand  side  of  equation  (9)  approaches  to  zero, 
asymptotically,  if  (k,  -k2)  is  not  an  integer  multiple  of  l/t  ■ 
Therefore,  in  the  limit  L  — »  °° , 
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£§7(*lH/’(jfc,)]=Jr  (k'  kl)~£r  (10) 

[0  other -wise 

where  £  denotes  an  integer.  Now,  choosing  k2  =  Aq  +  £1  L/T 
and  A4  =  k2+e2  L/T  ,  the  following  expression  for  covariance 
of  [t/(  A,  )[/*(£,)]  can  also  be  obtained  (see  [5]  for  details). 

Co  var  iance\j  ( A, )!/  *  (k2),U  (k3)U’  (k4 )] 


= ~  [5(*. -ki-£'L/T)  +  5(kl+  A,  -  £*  L/T)\  .  (11) 

Again  £'  and  £*  in  equation  (11)  denote  integers. 

Assuming  n(t)  of  equation  1  as  an  independent  white  Gaussian 

noise  sequence  of  variance  G 2 ,  the  following  relations  can  be 
obtained  for  the  noise  spectrum  N(k)  [5], 


:[A(Aq)A>2)j=aA5(Iq-/;2) 

E[A(Aq )  AV2  )A^(fc3  W(*4)]  = 

(12) 

^4|"<5(Al  -  k2)8(k2  -  A4)  + 

1  (13) 

- k3 )S(k2  -  A4 )  +  <5(  A,  )S(k2 )S(k2 )S(k4 ) 


4.  Channel  Estimation  using  the  Data  Covariance  Matrix 

Now  consider  the  input  data  covariance  matrix  Rr(t,r)  which 
consists  of  elements  rx(t,t)  that  are  defined  as, 

rx{t,T)  =  E[x(t)x(t-'C)\.  (14) 

(Note  that  above  data  covariance  matrix  is  used  for  channel 
identification  using  the  subspace  method  of  channel  estimation 
[2].) 


Performing  a  2-d  Discrete  Fourier  Transform  (DFT)  on  R,  (t,T) 
we  obtain 


rf(v)<=> 


2d -DFT 

T  — >  V  t  -» 


,  &('.*)}  . 
k 


(15) 


It  is  noted  here  that  we  can  estimate  the  performance  of  channel 
estimation,  e.g.  Cramer-Rao  bounds  (CRBs)  of  estimates,  by 

using  either  rx(t,T)  or  Ff(v).  Both  methods  would  result  in 
identical  measures  as  the  transformation  in  equation  (15)  is  one 
to  one.  It  is  further  noted  here  that  in  all  reported  literature  the 
CRBs  are  obtained  in  the  time  domain  using  Rx(t,r) .  In  the 
following  we  propose  to  estimate  the  CRBs  of  channel  estimation 
in  the  frequency  domain  using  I"*  (v)  . 


To  do  this  we  first  note  that  I"*  (v)  can  be  expressed  as  [1] 

rkx(v)  =  X(v+k27t/T)X\v)  ,  0<k<T-l  (16) 


Suppose  T‘  (v )  is  evaluated  using  L  data  samples  at  2i i/L 
frequency  points,  i.e.  at  v  =  m2n/L  for  0  <  m  <  L  ,  we  get, 

r£  (m)  =  X (m  2] l/L  +  k  2 n/T )X*(m  2n/L) ,  (17) 


Now  using  the  results  of  previous  section,  i.e.  equations  (3)  and 
(10)  -  (13),  the  following  can  be  obtained. 

e\t*  (m)]=  H((m  +  k  %)  2 n/L)H *  ( m  2 Jt/L)  +  Ta2S(k)  (18) 

and 


Co  variance 


fj:1  (ml ),  rj-  (m2) 


H((mi  +ki  L/r)2 tt/l)H*  (mj  2k/L)x 
H((m2  +k2  l/t)2 7t/L)H*  (m2  2 tt/L)x 
[5(n?i  -m2  -£, L/T)  +  8(mi  +m2  —e’ L/T)\  ’ 


T2g4 

M 


M 


8(k\  —k2)S(m[  -m2)x 


[l+8(k1)8(mi)+$(.ki)5(ml  -L)] 


(19) 


where  C~  is  the  power  of  the  additive  white  Gaussian  noise 
process.  Note  that  in  above  equations  (18)  and  (19)  it  is  assumed 
that  M  data  segments  have  been  used  in  the  estimation  of 
tf.v(f,T)  . 


5.  Probability  Distribution  of  rf(m). 

We  have  presented  the  expectation  and  covariance  properties  of 
r‘  (m)  in  equation  (18)  and  (19),  respectively.  In  this  section  we 
will  investigate  the  probability  distribution  of  T*  (m) .  Typically, 
the  number  of  data  segments  M  used  in  the  estimation  of 
Rx(t,z)  is  large.  Hence  we  can  assume  that  the  probability 

distribution  of  T*  (m)  obey  a  complex  Gaussian  distribution.  The 
following  simulation  results  justify  this  claim.  The  simulation 
results  of  Figure  1  are  obtained  using  the  channel  model 
discussed  in  reference  [2],  A  binary  independent  symbol 
sequence  (±1)  (BPSK)  was  used  in  the  simulations.  (The 
number  of  virtual  channels  T  =  4;  the  width  of  DFT  temporal 
window  L  =  10xT  ;  the  degree  of  ISI  =  4;  the  number  of  data 
symbols  used  =  500;  SNR=  30dB.)  The  probability  distribution 
of  r*  (m)  (at  k  =  l,m  =  11)  was  obtained  using  10,000  iterations. 
It  can  be  seen  from  Figure  1  that  both  real  and  imaginary  parts  of 
r*  (m)  obey  a  Gaussian  distribution. 


6.  Cramer-Rao  Bounds  for  Channel  Estimation 

For  the  derivation  of  Cramer-Rao  bound  (CRB)  for  the  channel 
estimation,  consider  T*  (in)  as  the  complex  observation  vector, 

y  =  T*  (m)  0  <  m  <  L— 1,  0<k<T-l,  (20) 
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7.  Discussion 

A  frequency  domain  approach  for  evaluating  estimation  bounds 
(Cramer-Rao  bounds)  of  a  base-band  communication  channel 
model  parameters  is  presented  in  the  paper.  The  frequency 
domain  approach  of  evaluating  the  CRB  is  the  novel  part  of  the 
reported  work.  Usually  CRB  is  estimated  in  the  time  domain 
using  correlation  matrix  Rft, r) .  The  CRB  evaluated  from  either 

Rft, t)  or  T *(m)  should  produce  identical  results,  as  the  latter 

is  the  2-d  Fourier  transform  of  the  former.  Moreover,  the 
described  frequency  domain  approach  provides  useful  insight 
into  the  channel  estimation  problem  and  also  is  independent  of 
the  adopted  parameter  estimation  technique.  The  proposed 
frequency  domain  technique  does  not  depend  on  the  noise 
probability  density  function.  Furthermore,  the  CRB  derivation 
technique  could  be  easily  extended  for  evaluating  the  estimation 
performance  in  the  presence  of  colored  noise,  by  modifying 
equations  (12)  and  (13). 


Gaussian  distribution.  Simulations  for  both  real  and 
imaginary  parts  of  r*  (m)  are  shown. 
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ABSTRACT 

In  this  paper,  we  discuss  the  application  of  sequen¬ 
tial  importance  sampling  (SIS)  to  joint  channel  estima¬ 
tion  and  decoding  of  space-time  trellis  codes  (STTCs). 
First,  we  present  the  dynamic  state  space  model  (DSSM) 
of  the  system,  and  then  we  briefly  review  the  theory  of 
SIS.  A  special  case  of  SIS,  a  combination  of  SIS  and 
Kalman  filtering,  is  shown  through  simulations  to  be  a 
viable  approach  to  the  problem  which  addresses  time- 
varying  flat-fading  environments.  Our  solution  is  ad¬ 
missible  if  phase  ambiguity  can  be  avoided.  We  show 
that  the  phase  ambiguity  can  be  reduced  by  carefully 
designing  the  STTC  modulation  constellations. 

1.  INTRODUCTION 

Space-time  coding  (STC)  introduced  by  Tarokh  et  al. 
[1]  exploits  spatial  and  temporal  diversity  and  thus  pro¬ 
vides  a  framework  for  increased  data  rates  in  wireless 
communications.  Among  families  of  space  time  codes, 
STTCs  have  more  advantages  than  ST  block  codes  as 
pointed  out  in  [1],  It  is  generally  assumed  that  STC 
will  be  used  in  fading  environments.  Therefore,  while 
decoding,  it  is  necessary  to  estimate  the  channel  state 
information  (CSI),  i.e.,  the  fading  coefficients  of  the 
channel.  Most  of  the  time  in  the  literature,  it  is  as¬ 
sumed  that  the  CSI  is  available  through  sending  pilot 
signals  periodically  from  the  transmit  to  the  receive 
side.  Here  we  consider  the  problem  of  joint  estimation 
of  the  CSI  and  decoding  of  STTC  when  pilot  signals 
are  not  available.  Because  the  problem  is  highly  non¬ 
linear,  it  is  hard  to  apply  for  its  resolution  conventional 
methods  such  as  the  extended  Kalman  filter.  Recently 
in  [2]  and  [4],  SIS  has  been  used  to  address  this  type 
of  communication  problems.  In  [2],  a  scheme  was  de¬ 
veloped  for  the  joint  detection  and  decoding  of  con¬ 
volutional  codes  via  combination  of  SIS  and  Kalman 
filtering.  In  this  paper,  we  generalize  their  scheme  and 
apply  it  to  STTC  decoding  when  the  channel  is  as¬ 
sumed  to  be  flat-fading,  time-varying,  and  is  modeled 
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Figure  1:  Space-Time  Trellis  Code  System 

by  auto-regressive  moving-average  (ARMA)  processes. 
We  propose  a  quadruple  8-PSK  constellation  to  a  delay 
diversity  STTC  that  greatly  reduces  the  effect  of  phase 
ambiguity  on  the  symbol  error  rate  (SER). 

2.  SYSTEM  DESCRIPTION 

Suppose  a  communication  system  employs  N  transmit 
antennas  and  M  receive  antennas  as  in  Figure  1.  A 
sequence  of  user  data  symbols,  .s0 , ...,  st,  where  sL  E  A 
and  A  is  the  set  of  all  possible  user  data  symbols,  is 
put  through  a  space-time  trellis  encoder.  The  new  user 
state  vector  of  the  space-time  trellis  encoder  at  time  t  is 
determined  according  to  the  state  transition  equation, 

st  =  Z(st-i,st)  (1) 

where  st_i  is  the  previous  user  state  and  St  is  the  new 
user  symbol.  Based  on  the  current  user  state,  the  en¬ 
coder  generates  a  code  vector  that  consists  of  N  sym¬ 
bols,  c(s4)T  =  [ci(s()  ...  cjv(sf)],  to  be  transmitted  by 
antennas  where  Cj(-)  denotes  the  modulation  function 
for  the  ith  antenna. 

Let  anm,t  be  the  fading  coefficient  from  the  nth 
transmit  antenna  to  the  mth  receive  antenna  at  time 
t.  The  fading  coeffcient  can  be  modeled  as  an  ARMA 
process  that  matches  the  power  spectral  density  of  the 
channel.  An  ARMA  (ri,r2)  process  can  be  represented 
as 

anm,t  T  ^l^nra.f-l  '  ’  '  4*r\  ®nm,t—ri 

—  PoUjnm,t  H - 1“  Pr2^,nm,t—r2  (2) 
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where  unm,t  is  an  i.i.d.  random  complex  Gaussian  pro¬ 
cess  that  drives  the  ARMA  process,  and  {</>,■}  and  {p,} 
are  known  AR  and  MA  coefficients.  We  assume  that 
all  channel  coefficients  have  the  same  power  spectral 
density  and  therefore  their  AR  and  MA  coefficients  are 
identical. 

The  next  section  shows  that  a  DSSM  representa¬ 
tion  of  the  STTC  system  is  convenient  for  the  deriva¬ 
tion  of  the  SIS  algorithm.  For  convenience,  we  assume 
n  =  r-2  =  r;  otherwise  zeros  can  be  padded  to  the  co¬ 
efficients  to  make  the  orders  equal.  Also,  we  introduce 
the  channel  state  vector  hj1711t  =  [hnw.t  h„m.t-r], 
which  is  of  dimension  (r  +  1).  Then  a  state  transition 
equation  can  be  constructed  according  to 

hmn./  =  T  (3) 


where 


II 

o 

pH 

'  -01 

1 

-02  •• 
0  •• 

0  r 

0 

1 

O  O  -  • 

,  and  g  = 

'  1  ' 
0 

0 

0  •• 

1 

0  _ 

.  0  . 

The  fading  coefficient  is  then  represented  as 

®nm.l  =  O  h„m./ 


where  oT  =  [p0  pi  ■■■  Pr ]• 

Now  we  arrange  all  the  fading  coefficients  at  time 
t  into  a  single  NM  x  1  vector,  a,  =  [au,i  -  ojvi.( 
■  ■  •  aiM,t  ■  •  ‘  0‘Ni\r.t}T,  and  define  the  NM (r  +  1)  x  1 
channel  state  vector  as  h,  =  [hn,<  •  •  ■  hjvi.<  ■  ■  ■  hu/.< 
•  •  •  h NM,t]T-  Then  we  can  express  all  the  fading  coef¬ 
ficients  in  a  compact  form,  i.e., 

h;  =  Fh,_i  +  Gu,  (4) 

a,  =  Oh;  (5) 

where  F,  G,  and  O  are  obtained  from  F0,  g,  and  o, 
respectively.  For  example, 

r  o  o  •••  oi 


L  0  0  •••  o  J 

where  0  is  an  (r  +  1)  x  1  all  zero  vector,  and  the  matrix 
O  is  of  dimension  NM(r  +  1)  x  NM(r  +  1). 

With  ideal  time  and  frequency  information  and  flat 
fading,  the  received  signals  are  simply  the  product  of 
fading  coefficients  and  users  signals  embedded  in  noise. 
At  time  t  we  can  write, 

y<  =  C(s,)a,+v; 

=  C(s/)Oh/  T  V;  (6) 

where  yt  =  [yi,/.  •  •  -yM,t]T  is  the  received  signal  vector 
at  all  M  receive  antennas,  and  v;  =  [iq,;  r>m.f]T 


is  the  observation  noise  vector.  The  code  matrix  is 
an  M  x  NM  matrix  constructed  from  the  code  vector 
c(sf),  or 


C(s,)T 


■  c(s,)  0 

0  c(s,) 


0  0 


0  ■ 
0 


c(s,)  J 


where  0  is  an  Ar  x  1  all  zero  vector.  The  state  transition 
equations  (1)  and  (4)  and  the  observation  equation  (6) 
together  form  the  DSSM  representation  of  the  STTC 
system.  We  can  see  that  the  DSSM  describes  both  the 
channel  state  vector  h(  and  the  user  state  vector  s/  as 
Markov  processes  hidden  in  the  observation  y ; . 


3.  COMBINED  SIS  AND  KALMAN 
FILTERING 

In  this  section,  we  review  the  theory  of  combined  SIS 
and  Kalman  filtering,  which  we  propose  to  apply  to 
the  STTC  system.  Define  s0:/  =  {s0,...,S/}  as  the  set 
of  all  user  state  vectors  up  to  time  t,  and  let  yo-j  be 
defined  similarly.  In  the  Bayesian  framework,  all  the 
information  about  the  user  data  symbols  is  contained 
in  the  posterior  density,  p(s0:;|yo:/).  The  evaluation 
of  the  expected  value  of  a  function  of  the  user  states 
£(s0;f),  using  the  posterior  density,  involves  high  di¬ 
mensional  integration  and  is  almost  impossible  to  carry 
out  analytically.  However,  if  we  have  samples  from  the 
posterior  density,  stf],  where  j  =  1,2,...,  J  is  the  sam¬ 
ple  index,  we  can  approximate  the  expectation  using 
Monte  Carlo  integration 


£[<£( so:/|yosi)]  -  r 


(7) 


j=i 


where  te{j)  is  the  weight  associated  with  the  jth  sample 
and  IF,  =  Ylj=i  wt  's  the  sum  of  the  weights.  Most 
of  the  time,  however,  taking  samples  from  p(s0j\y0:t) 
is  a  difficult  task  itself,  and  we  have  to  resort  to  im¬ 
portance  sampling,  i.e.,  drawing  samples  from  an  im¬ 
portance  function  Tr(s0.t  |yo:f)  that  may  render  the  task 
easier.  Then,  samples  drawn  from  the  importance  func¬ 
tion  are  weighted  according  to 

„,0)  =  (8) 

*-(s0:f|y0:f) 

However,  even  the  direct  importance  sampling  from  the 
distribution  7r(s0:/|yo:f)  is  difficult.  Fortunately,  the 
posterior  density  function  can  be  factored,  that  is, 

p(s0:/|yo:f)  ocp(s0:,_i|y0:f-i)  xp(y(|s,_!)  (9) 

and  this  provides  us  with  a  possibility  to  evaluate  the 
posterior  recursively.  Indeed,  if  we  select  an  impor¬ 
tance  function  in  the  form  of 

7r(s0:f|y0:f)  =  7r(so:f-]|yo:f-iMs;)  (10) 


560 


as  new  observations  become  available,  we  can  evaluate 
the  importance  weights  recursively. 

The  SIS  algorithm  can  be  implemented  according 
to  the  following  scheme: 


For  t  =  0,1, 2, ... 

1.  Sample  s[^  ~  7r(s()  and  set  s^J  =  (sq]_1,s^), 
where  j  =  1, ...,  J. 

2.  Evaluate  the  importance  weights  according  to  (8). 
It  can  be  shown  that  when  the  importance  func¬ 
tion  is  in  the  form  of  (10),  the  weights  can  be 
obtained  recursively  from 


w 


U) 


=  w. 


U) 

t-i' 


p(ytlstj))p(stj)l 


Si) 


i) 


r(sF) 


(11) 


An  important  characteristic  of  transmitted  signals 
in  some  communication  systems,  such  as  the  one  ad¬ 
dressed  here,  is  that  future  observations  yt+i:i+p  often 
hold  information  about  the  current  user  state  vector 
Sf.  As  a  result,  another  posterior  density  of  interest 
is  p(so;f  |yo:f+p).  One  can  obtain  it  by  first  finding 
p(s0:t+p  |yo:t+P)  and  then  marginalizing  with  respect 
to  s t+v.t+p.  The  density  p(s0:t+p|  you+p)  can  be  ap¬ 
proximated  using  the  combined  SIS  and  Kalman  fil¬ 
tering  algorithm  described  above,  which  is  equivalent 
to  the  delayed  weight  method  in  [2].  Another  way  of 
solving  this  problem  is  to  use  the  delayed  importance 
function  p(sf|sQ-!j_1,yo:M.p),  which  takes  into  account 
all  the  relevant  observations.  The  drawback  of  this 
importance  function  lies  in  its  computational  complex¬ 
ity  as  we  can  see  from  the  expression  for  the  density 

P(st  lsou-i , you+p),  where 


The  selection  of  the  importance  function  affects  the 
efficiency  of  the  SIS  algorithm  in  terms  of  the  number 
of  samples  needed  to  approximate  the  distributions  of 
interest.  The  optimum  importance  function  is 

tt(s()  =p(sf|s^)_1,yo:t) 

and  a  proof  of  it  is  provided  in  [3].  In  our  case  when 
the  CSI  is  unknown,  the  optimum  importance  function 
can  be  further  expended  according  to 

P(silso?t-i»yo:t)  =  J  P(st  i  h*  lson-n  yo:t)dht 

I  P(yi|si,ht)p(ht|s^t)_1,y0:t-i)dht 

xpNso-t-i)-  (12) 

Inspecting  (4)  and  (6),  we  can  see  that  the  DSSM,  given 
the  user  state  vectors,  is  linear  and  Gaussian  in  the 
channel  state  vectors.  Hence,  the  Gaussian  probability 
density  function  p(ht|sg:'f)_1,yo:t-i)  can  be  obtained  by 
computing  the  mean  and  the  covariance  matrix  of  hf 
using  the  prediction  step  of  the  Kalman  filter.  Because 
the  likelihood  function  p(yt|s(,hi)  is  Gaussian  as  well, 
the  integration  in  (12)  can  be  carried  out  analytically. 
As  a  result,  we  propose  to  use  a  combined  SIS  and 
Kalman  filtering  algorithm,  which  is  conceptually  the 
same  as  the  one  from  [2].  The  algorithm  is  composed 
of  the  following  steps: 

1.  For  j  =  1 ,...,  J,  use  the  prediction  step  of  the 
Kalman  filter  to  obtain  ^hjls^^you-i)  and 
evaluate  the  proposal  density  using  (12). 

2.  Draw  samples  and  compute  their  weights  as  in 
the  case  of  common  SIS  algorithms. 

3.  For  j  =  1, ...,  J,  use  the  update  step  of  the  Kalman 
filter  to  obtain  the  density  function  p(ht  |sq!]  ,  you) , 
which  is  needed  for  the  next  round  of  iteration. 


p(s«|s‘(’_i,  you+p) 

p  r 

<x  £  n  P(yt+r\St+T,  hi+T) 
s<  +  i, t+p£.Ap  t= 0 

xp(hf+r|st:t+T,  S^_!,  y0:t+T-l)dht+T 

x  p(s<u+p|s0:/-l)-  (13) 

To  evaluate  this  expression,  one  has  to  perform  pre¬ 
dictive  Kalman  filtering  for  all  possible  future  sequences 
of  user  state  vectors  st:<+p.  The  complexity  of  the  algo¬ 
rithm  is  proportional  to  the  size  of  the  set  Ap .  The  de¬ 
layed  sample  importance  function  should  be  weighted 
with  respect  to  the  posterior  density  p(sou|you+p)  which 
turns  out  to  be 


w. 


(j) 


(X  w. 


U) 

t- i  ' 


Es,  p(s 


U)\Ji) 


“OU-l) 


you+p) 


p(yt:i+p-l|So:t-l) 


(14) 


The  denominator  can  be  evaluated  similarly  as  the  im¬ 
portance  function.  When  dealing  with  delayed  esti¬ 
mation,  intuitively  the  number  of  delayed  samples  and 
delayed  weight  should  be  selected  so  that  all  relevant 
observations  are  taken  into  account  according  to  the 
constraint  length  (memory)  of  the  STTC. 


4.  PHASE  AMBIGUITY 

The  fading  coefficients  and  the  modulated  user  data 
are  all  unknowns  that  have  to  be  estimated  and  there 
may  be  multiple  pairs  of  estimates  with  identical  pos¬ 
terior  probabilities.  If  (C(st),hj)  is  an  estimate  of  (6), 
a  phase  shifted  version  (C(st)0,  ©_1h#),  where  0  is 
a  phase  shifting  matrix,  will  be  an  equally  acceptable 
estimate.  For  example,  consider  the  8-PSK  delay  di¬ 
versity  STTC  with  two  transmit  antennas  as  described 
in  [1],  The  codes  for  the  two  transmit  antennas  are 

ci(st)  =  e~jS^St  c2(sf)  =  e + 
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Figure  2:  Constellation  of  Space  Time  Trellis  Code 

where  s,  =  [s(,s;_i]T.  The  phase  shifting  matrix  in 
this  case  is  ©  =  diag{e-,(?l ,  eJ0-}.  However,  as  we  use 
a  recursive  algorithm  that  evolves  with  time,  a  phase 
shifted  version  of  the  channel  state  estimation  with 
phase  shifting  matrix  0  =  diag{eJ<?I  is  plausi¬ 

ble  over  a  time  period  t  :t  +  q ,  where  q  e  N  are  natural 
numbers,  if  only  the  following  condition  is  met 

c:1(c1(s)e^)  =  c2-1(c2(s)e^)  Vs  €  su+q  (15) 

i.e.,  given  0,  a  legitimate  code  vector  sequence  C(s,)0, 

•  •  •  ,  C(sf+9)0  can  be  found.  The  above  condition  can 
be  satisfied  given  several  pairs  of  (8 i ,  #2)  in  the  original 
modulation  constellation  Mq.  As  a  result,  phase  ambi¬ 
guity  will  occur  and  simulation  had  shown  that  it  can 
cause  a  break  down  of  the  algorithm.  One  may  con¬ 
sider  allowing  use  of  a  bigger  variety  of  constellations 
to  avoid  the  occurrence  of  (15)  Vg,  while  guaranteeing 
certain  amount  of  coding  gain.  If  the  condition  can¬ 
not  be  avoided  completely,  one  should  try  to  prevent 
it  for  larger  values  of  q.  It  is  a  challenging  task  to 
design  constellations  that  best  explore  the  spatial  and 
time  diversities  with  the  added  requirement  of  reduced 
phase  ambiguity.  We  propose  four  ad-hoc  designed  8- 
PSK  constellations  where  each  time-slot  is  divided  into 
two  sub-slots  with  two  transmit  antennas.  It  is  consid¬ 
ered  that  the  fading  coefficients  of  the  channel  will  not 
change  between  the  two  sub-time  slots.  The  constella¬ 
tions  of  ci  to  C4  are  shown  in  Figure  2.  Phase  ambiguity 
is  greatly  reduced  by  using  this  new  constellation. 

5.  SIMULATION 

We  simulated  a  two  transmit  antenna  and  one  receive 
antenna  STTC  system.  The  ARM  A  process  describing 
the  CSI  was  chosen  the  same  as  in  [2].  We  used  the 
same  resampling  process  as  ascribed  in  [2]  and  resam¬ 
pled  for  every  5  steps.  The  number  of  delayed  weight 
and  delayed  sample  was  one. 

The  result  of  the  joint  detection  and  decoding  al¬ 
gorithm  is  shown  in  Figure  3  and  it  was  compared 
with  the  genie-aided  case  when  an  additional  stream  of 
known  user  data  was  sent  through  the  same  channel  for 
the  direct  estimation  of  the  channel.  Channel  estimates 
in  terms  of  mean  and  covariance  were  obtained  using 
Kalman  filtering.  Then,  they  were  employed  in  the 
combined  SIS  and  Kalman  filtering  algorithm  replac¬ 
ing  channel  estimates  obtained  from  the  density  func¬ 
tion  p(hf|  Sou-iiYou-i)-  The  genie-aided  case  serves 


Figure  3:  Simulation  Result 


as  a  lower  bound  and  the  proposed  algorithm  is  3  dB 
away  from  the  bound.  The  genie-aided  case  assumes 
that  the  additional  information  about  the  channel  are 
obtained  without  any  cost.  However,  in  practice  it  is 
almost  never  the  case.  If  we  take  into  account  the  ad¬ 
ditional  amount  of  energy  used  for  channel  estimation 
in  the  genie-aided  case,  the  3  dB  difference  in  the  sim¬ 
ulation  result  can  be  explained.  For  every  simulated 
point,  at  least  100  symbol  errors  were  accumulated. 
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ABSTRACT 

A  modified  constant  modulus  algorithm  (MCMA)  for  adaptive 
equalization  of  wireless  indoor  channel  for  QAM  signals  is 
presented.  The  algorithm  minimizes  an  error  cost  function  that 
includes  both  amplitude  and  phase  of  the  equalizer  output.  In 
addition  to  the  amplitude-dependent  term  that  is  provided  by  the 
conventional  constant  modulus  algorithm  (CMA),  the  cost 
function  includes  a  signal  constellation  matched  error  (CME) 
term.  This  term  speeds  up  convergence  and  allows  the  equalizer 
to  switch  to  Decision  Directed  (DD),  or  any  soft-decision  mode, 
faster  than  the  CMA  applied  alone.  The  constellation-matched 
error  term  is  constructed  using  polynomials  with  desirable 
properties.  The  MCMA  is  applied  to  a  decision  feedback 
equalizer  and  shown  to  provide  improved  performance  over  dual 
mode  techniques. 


1.  INTRODUCTION 

The  Constant  Modulus  Algorithm  (CMA)  [1, 2, 3,  4,  5]  is  a  blind 
technique  that  achieves  channel  equalization  without  the  need  of 
a  training  sequence.  The  use  of  CMA  in  the  initial  phase  of 
adaptive  equalization  and  then  switching  (dual  mode)  to  DD  or 
another  constellation-matched  algorithm,  at  appropriate  values  of 
mean-square  error  (MSE),  is  typically  performed  to  improve  both 
the  global  and  local  convergence  properties  [3,  5,  6,  7],  In  [6], 
the  constellation-matched  error  (CME)  term  is  the  magnitude- 
square  of  finite  order  polynomial  of  the  equalizer  output  with 
zeros  at  the  signal  message  points,  whereas  in  [7],  this  term  is  the 
complement  of  the  sum  of  Gaussian  functions  centered  at  the 
message  points. 

Unlike  dual-mode  techniques,  the  proposed  modified  CMA 
(MCMA)  is  similar,  in  concept,  to  the  “Stop-and-Go”  algorithm 
in  that  the  equalizer  integrates  a  constellation-matched  error  term 
in  a  continuos  manner  during  both  initialization  and 
convergence.  However,  in  the  MCMA,  the  cost  function  is 
constructed  from  two  separate  well  defined  error  terms,  one  is 
identical  to  the  CMA  case,  and  the  other  corresponds  to  a  DD 
mode,  or  any  other  constellation  matched  error  function. 

The  proposed  MCMA  belongs  to  stochastic  gradient  descent 
schemes.  It  performs  well  for  QAM  and  under  dynamic  channels, 
as  it  converges  to  acceptable  levels  of  MSE  faster  than  the  CMA 
if  applied  alone.  At  these  levels,  adaptation  may  proceed  with 
only  the  signal  constellation  matched  error  term  of  the  cost 
function  and  without  the  CMA  part.  The  latter  may  cause 


residual  errors  at  local  convergence  regions,  specifically  for  high 
order  QAM. 

The  paper  provides  the  framework  for  defining  and  selecting  an 
appropriate  CME  term.  This  term  should  satisfy  three  main 
desirable  properties,  namely,  uniformity,  symmetry,  and 
zero/maximum  penalties  at  the  zero/maximum  deviations  from 
the  QAM  symbols.  These  properties  serve  to  shape  the  cost 
function  in  a  manner  that  is  not  biased  towards  any  specific 
alphabet,  and  to  properly  alert  the  adaptive  equalizer  when  high 
error  values  are  produced.  It  is  noted  that  neither  CME  functions 
defined  in  [6,  7]  strictly  satisfies  these  properties  over  the  entire 
extent  of  constellation  region.  On  the  other  hand,  an  even-power 
cosinusoidal  CME  function  satisfies  the  above  conditions,  yields 
a  simple  gradient,  and  bounds  on  the  associated  adaptation  step 
is  inversely  proportional  to  its  power. 

Section  2  of  this  paper  presents  the  proposed  modified  constant 
modulus  algorithm.  In  Section  3,  the  local  convergence 
properties  are  analyzed.  The  simulation  performance  of  the 
MCMA  is  presented  in  Section  4. 

2.  MODIFIED  CONSTANT  MODULUS 
ALGORITHM 


The  general  form  of  cost  function  for  the  modified  constant 
modulus  algorithm  is  given  by 


J(w)  =  E{(\zt  |2  -  Af  +  p(g(zkr )  +  g(zti ))}  (1) 


(2) 


where  zt  is  the  received  baseband  complex  signal,  zkr  and  are 


the  real  and  imaginary  parts  of  zk  respectively,  and  7,  is  the 
transmitted  symbol.  The  first  term  in  (1)  is  the  amplitude  error 
function,  which  is  the  cost  function  of  conventional  CMA.  g(x)  is 
a  constellation  matched  error  function  and  pis  a  weighting  factor 
that  trades  off  amplitude  and  constellation-matched  errors.  We 
define  g(x)  as  a  polynomial  function.  This  polynomial  must  strive 
to  satisfy  three  important  key  properties  in  the  range 
-2  Ld  <  x  <  2  Ld  for  QAM  signals, 

s,  =Qn{  -i)d,  n\  =-L+\-  ■  ;L 
sy=(2nj-fyl1  «}=-£  +1--;T 


where  (s„  sy)  is  the  symbol  point,  L  is  an  integer  number  and  2d 
is  the  minimum  distance  between  symbols. 
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Property  1 : 

The  polynomial  should  be  uniform  in  that  it  does  not  favor  or 
penalize  information  symbols  over  others,  which  justifies  its 
periodic  behavior, 

g(x)  =  g(*  +  2M)  (4) 

where  /  is  an  integer. 

Property  2: 

The  polynomial  should  be  symmetric  around  each  alphabet,  i.e.. 

g(sx  +  .r)  =  g(sx  - x)V,  : 0 <x<d  (5) 


Property  3: 

The  maximum  value,  which  is  normalized  to  one.  is  reached  at 
the  center  point  in  between  two  consecutive  alphabets.  The 
minimum  values  are  zeros  and  only  occur  at  the  constellation 
points.  Accordingly,  the  cost  function  places  the  highest  penalty 
at  the  maximum  deviation  and  no  penalty  for  zero  errors.  That  is. 

g(s, ±d)  =  \  and  g(sx)  =  0  (6) 


The  above  three  polynomial  properties,  although  desirable,  may 
consume  a  large  number  of  degrees  of  freedom  which  translate 
into  high  polynomial  order  and  complex  error  gradient 
expression.  The  latter  increases  algorithm  computations  per 
iteration  update  and  may  render  the  algorithm  unattractive  for 
real-time  implementations. 

The  gradient  recursion  for  the  equalizer  weight  vector.  h\  can  be 
formulated  as  [8] 

=  >A  -^(h-)|h,  =  k)(  (7) 

where  fJ  is  a  step  size.  The  derivative  of  J(h>)  can  be  carried  out 
term  by  term  from  (1).  The  first  term  is  directly  from  the  CMA. 
Its  derivative  is 


dw 


E{(\zt\2 -Af}=AE{<\zk\- -A)zkxk} 


The  2h.  and  zw  are  expressed  explicitly  in  terms  of  h>. 


A+A  wHxk+xHkw 


z,  -  zk  w  xk  -  x.  w 


ht  > 

2  J 

then  it  is  easy  to  show  that 


2  j 


d  d 

—  At  =  A-  —A.  =  ~JXl 
dw  dw 


Based  on  (10),  it  can  be  shown  that 
d 


dw 


E  {g(  Ar )  +  ( A.  )}  =  E{nk  xk } 


where 


'j-j-g(x)  I 

ax 


kr 


ki 


(B) 


(9) 


(10) 


(11) 


(12) 


According  to  (8),  (10)  and  (12),  the  derivative  of  J(w)  is  obtained 
as 


VJ(hO  =  4E ||"|z,  f  -  A)z I  +^T]t  ]-A  J  (13) 

Therefore,  the  modified  CMA  updating  equation  is 

Wiu  =»*’» (I4) 
and 

ft  =(|a|2-^)A (15) 


Figure  1.  gr  (,v)  (a)  n=  1.  (b)  n- 2.  (c)  n- 3.  (d)  n- 4. 


<d) 


Figure  2.  g,{x)  (a)  w=l.  (b)  n=2,  (c)  «=3.  (d)  «-4. 

In  [10].  finite  order  polynomials  are  designed  to  form  the  desired 
CME  term.  It  can  readily  be  known  that  even  power  cosinusoidal 
functions,  representing  infinite-order  polynomials,  shown  in 
figure  1.  satisfy  the  above  three  properties  [10.  11], 

£,.(*)  =  cos2"  (^;r)  (16) 

la 

where  n  is  an  integer.  It  is  also  found  that 

g5(x)  =  l-sin2"(^-;r)  (17) 

lei 

satisfy  the  same  three  properties,  as  shown  in  figure  2.  The 
functions  in  (16.  17)  are  polynomials  of  infinite  order,  but  have  a 
simple  closed  form  representations  and  their  values  can  be 
obtained  using  look-up  tables.  Comparing  Fig.  1(b)  and  Fig.2(b). 
Fig.l(c)  and  Fig.2(c).  and  Fig.l(d)  and  Fig.2(d).  it  is  clear  that 
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for  n>  1,  gc  (x)  is  flat  at  the  symbol  points,  whereas  gs(x)  is 
sharp.  This  behavior  has  a  desirable  effect  on  the  performance  of 
the  MCMA,  as  shown  in  the  simulation  examples. 

3.  LOCAL  CONVERGENCE  PROPERTIES 

It  is  shown  that  after  MCMA  converges  to  acceptable  levels  of 
MSE,  adaptation  may  proceed  with  only  the  signal  constellation 
matched  error  term  of  the  cost  function  and  without  the  CMA 
part.  The  latter  may  cause  residual  errors  at  local  convergence 
regions,  specifically  for  high  order  QAM.  In  this  section,  we 
analyze  the  local  convergence  properties  of  the  MCMA  using 
(17).  The  analysis  closely  follows  that  given  in  [7],  We  denote  s 
and  x  as  the  input  and  output  channel  sequences.  If  the  channel 
matrix  is  C,  then  x  =  Cs  +  v,  where  v  is  additive  white  noise. 
Denoting  w  as  the  equalizer  weight  vector,  the  output  of  the 
equalizer  is  h^x.  The  input  sequence  is  of  independent 
identically  distributed  (iid)  symbols.  We  denote  w»0  the  ideal 
equalizer  weighting  vector  such  that  w0HCs  equals  to  one  of  the 
constellation  points.  In  general,  the  equalizer  vector  is  w  =  h»0  + 
Ah’,  where  A  tv  represents  the  perturbation.  Assuming  a  small 
deviation  A w  of  the  equalizer  weights  with  respect  to  the  ideal 
vector  Wo  and  choosing  gt(x)  as  the  constellation  matched 
function,  we  obtain 

J}  (w)  =  E{[  1  -  sin2"  (-J-jt)]  +  [1  -  sin2"  (£*»}  (18) 

Id  2d 

Using  Taylor  series  expansion  around  the  message  point, 

J,(w)  -  E{n(^-f[(zr -s,y-  +(z,  -5,  )2]}  (19) 

la 

where  we  have  ignored  the  terms  of  high  orders.  The  above 
equation  can  be  expressed  as 

JJ(w)  =  n(^drE{\EW"Cs  +  w%\} 

~n(—f{c2sEwHCCHEw  +  alwHw}  (20) 

2  d 

where  cr ,  cr  indicate  the  average  power  of  the  transmitted 

sequence  and  the  noise  variance,  respectively.  The  respective 
gradient  is 

VJf  -  2n(^)V{CC"Aw  +  ^w" }  (21) 

Id  Cj 

In  noise-free  environment, 

W*+I  -  wk  -f1/  2”(~)2  °;CCHhwk  (22) 

Id 

Subtracting  w0  from  both  sides  of  (22), 

Aws+I  =(/-/!,  2  n{^-)2  a]CCH  )A  wt  (23) 

la 

where  I  is  the  identity  matrix.  This  recursive  rule  converges  if  the 
parameter  flf  is  chosen  to  satisfy  the  following  inequality 

0<]U/< — — - -  (24) 


where  A„,m  is  the  largest  eigenvalue  of  CCH  .  From  the  above 
inequality,  we  observe  that  small  values  of  n  allow  the  selection 
of  high  values  of  jU  f ,  for  the  same  channel,  i.e.,  At  the 

same  time,  small  values  of  n  correspond  to  flat  nulls  of  the  cost 
function,  which  affects  the  MSE  performance. 

4.  SIMULATIONS 

In  the  simulations,  the  transmitted  signal  is  64  QAM.  The 
channel  is  frequency  selective  and  comprised  of  two  multipaths 
with  complex  coefficients  [1.0000  0.1294-j0.4830].  The  signal- 
to-noise  ratio  (SNR)  is  30dB.  Decision  feedback  equalizers  were 
used  which  have  16  taps  for  the  feed-forward  filter  and  16  taps 
for  the  feedback  filter.  Figure  3  shows  the  performance 
comparison  of  the  conventional  CMA  and  the  MCMA, 
implementing  equations  (16,  17).  It  is  evident  from  the  MSE 
curves  that  the  MCMA  improves  the  equalizer  performance  by 
offering  faster  convergence  and  smaller  misadjustment.  Upon 
convergence,  the  Symbol  Error  Rate  (SER)  for  CMA  becomes 
1 .3  x  1 0~2 ,  whereas  for  the  MCMA,  the  SER  is  less  than  10  1 . 
The  main  reason  for  such  improvement  is  that  the  MCMA 
considers  both  the  modulus  and  constellation  properties  of  the 
transmitted  signals. 


Figure  3.  Performance  comparison  for  CMA  and 
MCMA.  (a)  CMA;  (b)  MCMA.  gc(x)  =  g,(x)  ,  n=  1;  (c) 
MCMA,  gc(x) ,  n=3;  (d)  MCMA,  gs(x)  ,  n=3. 

To  determine  the  effect  of  the  cosinusoidal  power  term  on 
MCMA  performance,  we  compare  figs  3  (b,c,d).  The  mean 
square  error  (MSE)  is  0.0812  for  n=  1,  where gc(x)  =  gs(x) .  For 

n= 3,  the  MSE  is  0.1102  for  gc(x)  ,  and  0.0783  for  gs(x)  .  It  is 
clear  that  Fig.4(c)  has  the  best  performance  while  Fig.4(b)  has 
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the  worst  performance.  Therefore,  for  the  MCMA.  the 
constellation-  matched  function  with  a  sharp  behavior  at  the 
symbol  points  enhances  the  performance  at  high  SNRs. 

The  above  simulation  has  demonstrated  that  the  MCMA 
performs  well  for  QAM,  as  it  converges  to  acceptable  levels  of 
MSE  much  faster  than  the  CMA.  if  applied  alone.  At  these 
levels,  adaptation  may  proceed  with  only  the  signal  constellation 
matched  error  term  of  the  cost  function  and  without  the  CMA 
part.  Figure  4  compares  the  performance  of  CMA  and  MCMA 
switching  to  DD  after  convergence  (3000  samples).  It  is  shown 
that  although  the  final  MSE  is  same  for  the  two  cases,  but  using 
MCMA  before  switching  need  less  convergence  time,  which 
because  the  MCMA  converges  faster  and  have  less 
misadjustment  than  CMA. 


Figure  4.  Performance  comparison  for  dual-mode 
algorithms  with  CMA  and  MCMA. 


Figure  5.  Performance  comparison  for  CMA  and 
MCMA  with  different  constellation-matched  functions 
(SNR  =  oo). 

In  [7],  the  constellation-matched  term  is  the  complement  of  a 
sum  of  Gaussian  functions  centered  at  the  symbol  points,  which 
also  approximately  satisfies  the  three  desired  properties  (4-6). 
This  CME  can  provide  almost  the  same  performance  as  the 
cosinusoidal  CME,  if  the  corresponding  parameters  are  choosing 
properly.  Figure  5  compares  the  performance  for  MCMA  for 


both  CMEs.  It  is  evident  that  the  two  CMEs  have  equal 
performance.  However,  it  should  be  noted  that  using 
cosinusoidal  functions  can  greatly  reduce  computations,  because 
in  [7],  the  CME  involves  the  sum  of  Gaussian  functions  centered 
at  every  constellation  point,  leading  to  computations  that  are 
proportional  to  the  number  of  symbol  points,  while  the 
cosinusoidal  functions  have  no  such  dependence. 

5.  CONCLUSION 

hi  the  paper,  we  have  presented  an  equalizer  that  minimizes  a 
cost  function  made  up  of  blind  and  constellation-dependent 
terms.  The  adaptive  implementation  is  referred  to  as  modified 
constant  modulus  algorithm.  It  is  shown  that  the  MCMA  leads  to 
faster  convergence  and  smaller  misadjustment  than  the  CMA. 
This  property  permits  one  to  switch  to  constellation-dependent 
mode  much  faster  than  the  case  when  CMA  is  applied  alone 
during  the  initialization  process.  The  paper  also  presented  a 
framework  to  select  the  constellation-matched  error  term  suitable 
for  cost  function  minimizations. 
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ABSTRACT 

We  present  two  different  fractionally  spaced  (FS)  equalisers 
based  on  subband  methods,  with  the  aim  of  reducing  the 
computational  complexity  and  increasing  the  convergence 
rate  of  a  standard  fullband  FS  equaliser.  This  is  achieved 
by  operating  in  decimated  subbands  at  a  considerably  lower 
update  rate  and  by  exploiting  the  prewhitening  effect  that 
a  filter  bank  has  on  the  considerable  spectral  dynamics  of  a 
signal  received  through  a  severely  distorting  channel.  The 
two  presented  subband  structures  differ  in  their  level  of  re¬ 
alising  the  feedforward  and  feedback  part  of  the  equaliser 
in  the  subband  domain,  with  distinct  impacts  on  the  up¬ 
dating.  Simulation  results  pinpoint  the  faster  convergence 
at  lower  cost  for  the  proposed  subband  equalisers. 

1.  INTRODUCTION 

Linear  channel  distortions  caused  by  multipath  propaga¬ 
tion  and  limited  bandwidth  lead  to  inter-symbol  interfer¬ 
ence  (ISI)  at  the  receiver,  which  in  many  cases  results  in  a 
high  bit  error  rate  in  the  detection.  Therefore,  many  dif¬ 
ferent  adaptive  equalisation  structures  have  been  proposed 
in  the  past  in  order  to  compensate  for  these  channel  dis¬ 
tortions  in  the  receiver.  Most  popular  amongst  the  subset 
of  linear  or  minimum-mean-squaxe  error  (MMSE)  equalis¬ 
ers  are  currently  fractionally  spaced  (FS)  architectures  [1], 
whereby  the  equalisation  filter  operates  at  a  rate  higher 
than  the  symbol  rate. 

A  standard  fractionally  spaced  equaliser  is  shown  in 
Fig.  1.  The  structure  operates  the  feedforward  (FF)  part  of 
the  equaliser  at  an  oversampled  rate,  here  twice  the  sym¬ 
bol  rate.  In  the  flow  graph  in  Fig.  1,  the  FF  part  is  im¬ 
plemented  as  a  polyphase  structure  [2]  the  two  polyphase 
components  running  ao[n]  and  a  i  [ri]  of  the  adaptive  FF  fil¬ 
ter  at  the  lower  symbol  rate.  The  two  filters  ao[n]  and  ai[n] 
are  excited  by  the  two  polyphase  components  of  the  over¬ 
sampled  channel  output  x[m].  The  feedback  (FB)  part  of 
the  equaliser  is  symbol  spaced.  This  is  due  to  the  equa¬ 
tion  error  formulation  or  the  decision  feedback  mode  of  the 
equaliser.  In  the  FB  part,  the  adaptive  filter  6[n]  can  be 
excited  by  either  a  training  signal  (switch  position  1)  —  a 
copy  of  the  transmitted  symbol  sequence  u[n]  delayed  by  A 
periods  —  or  in  decision  feedback  mode  (switch  position  2). 
All  FF  and  FB  parts  ao[n],  oi[n]  and  b[n]  are  adaptive  and 
updated  by  a  suitable  algorithm  at  the  symbol  rate  based 
on  an  appropriate  criterion  of  the  equalisation  error  e[n]. 


Figure  1:  Fractionally  spaced  equaliser  with  a  polyphase 
representation  of  the  FF  part. 


A  fractionally  spaced  equaliser  may  suffer  from  consid¬ 
erable  computational  complexity  due  to  the  requirement  for 
long  filters  if  the  channel  exhibits  severe  distortions  [3],  and 
from  slow  convergence  due  to  strong  spectral  dynamics  at 
the  input  to  the  equaliser  [4].  These  characteristics  have 
previously  triggered  the  application  of  subband  techniques 
to  FS  equalisers  [5],  based  on  the  computational  reduction, 
prewhitening,  and  parallelisation  properties  of  the  subband 
approach  [6,  7,  8].  In  this  contribution,  we  evaluate  two 
different  subband  architectures  for  FS  equalisers.  This  in¬ 
cludes  a  novel  scheme  for  including  the  equaliser’s  feedback 
section  into  the  subband  domain,  and  the  incorporation 
of  decision  directed  subband  equaliser  structures  to  track 
channel  alterations  after  initial  equaliser  training. 

This  paper  is  organised  as  follows.  In  Sec.  2,  we  briefly 
describe  the  channel  characteristics  and  motivate  subband 
decompositions.  Then,  we  introduce  the  proposed  subband 
adaptive  equaliser  structures  and  discuss  the  complexity  is¬ 
sue  in  Sec.  3.  In  Sec.  4,  we  present  some  simulation  results 
to  demonstrate  the  performance  of  the  subband  approach. 

2.  CHANNEL  CHARACTERISTICS  AND 
SUBBAND  DECOMPOSITIONS 

For  the  popularly  applied  least  mean  square  (LMS)  type  al¬ 
gorithm  in  equalisation,  the  convergence  speed  is  inversely 
proportional  to  the  eigenvalue  spread  of  its  input  signal  [9]. 
In  turn  the  eigenvalue  spread  of  a  signal  can  be  approxi¬ 
mated  by  the  ratio  between  the  maximum  and  minimum 
value  of  its  power  spectral  density  (PSD).  As  an  example 
for  the  spectral  dynamics  that  can  be  encountered,  we  con¬ 
sider  a  severely  dispersive  channel  given  in  Fig.  2.  The 


0-7803-701 1-2/01/$10.00  ©2001  IEEE 


567 


selected  channel  with  a  delay  spread  of  approximately  100 
symbol  periods  exhibits  additional  spectral  zeros  that  re¬ 
duce  the  ecjualiser  convergence  performance,  and  also  en¬ 
compasses  the  transmit  and  receive  filters,  that  impose  a 
low-pass  characteristic  on  the  PSD. 


coefficients  is  given  by 

T  P'Fullbanrl  +  Lp 

"Subband  — 

iV 

where  Lp  denotes  for  the  length  of  the  prototype  filter. 


Figure  2:  Channel  spectral  dynamics  characteristic  with 
transmit-  and  receive  filter. 


A  general  decomposition  into  K  frequency  bands  de¬ 
cimated  by  N  (so-called  “subbands”)  is  shown  in  Fig.  3. 
The  filters  in  both  analysis  and  synthesis  bank  are  band¬ 
pass  filters,  which,  together  with  the  decimation  process 
yield  a  prewhitening  of  the  subband  signals  compared  to 
the  input.  Further,  computational  savings  arise  due  to  an 
N  times  lower  update  rate  and  lower  filter  orders  compared 
to  fullband  implementations.  For  adaptive  filtering  appli¬ 
cations,  adaptive  filterings  can  be  operated  in  each  band 
independently,  which  lends  itself  to  a  parallel  implementa¬ 
tion.  As  a  drawback,  subband  structures  however  introduce 
aliasing  that  limits  the  algorithm  performance.  Therefore, 
oversampled  filter  banks  (OSFB)  with  and  oversampling 
ratio  K/N  >  1  are  preferred  here  [5,  6].  An  example  of 
K  =  16  subband  channel  is  indicated  by  the  band  edges 
in  Fig.  2,  where  the  eigenvalue  spread  within  each  band  is 
reduced.  Therefore,  the  faster  convergence  of  the  algorithm 
is  expected  with  subband  decompositions. 


analysis  fdter  bank  synthesis  filter  bank 


Figure  3:  A'-channel  filter  bank  decimated  by  N  with  anal¬ 
ysis  filters  Hh(z)  and  synthesis  filters  Gt,  (z). 


An  additional  benefit  of  the  subband  implementation  is 
that  an  impulse  response  in  the  decimated  domain  can  be 
modelled  with  less  coefficients  than  required  in  the  fullband 
case  due  to  the  increased  sampling  period,  achieving  similar 
modelling  capabilities.  In  general,  this  decreases  the  nec¬ 
essary  filter  length  by  a  factor  of  N,  whereby  a  moderate 
overhead  of  prototype  filter  coefficients  has  to  be  taken  into 
account  as  in  the  subband  domain  potentially  fractional  de¬ 
lays  have  to  be  modelled  [7],  The  length  of  subband  filter 


3.  SUBBAND  ADAPTIVE  EQUALISER 
STRUCTURES 

In  this  section,  we  introduce  two  different  subband  adaptive 
equaliser  structures  and  discuss  the  complexity  issues  of 
the  equalisers.  For  the  subband  implementation,  we  utilise 
OSFBs  as  described  in  reference  [10]. 

3.1.  Structure  I 

For  subband  equaliser  structure  I,  the  FF  part  of  the  full- 
band  equaliser  in  Fig.  1,  is  projected  into  subbands.  The 
resulting  architecture  is  shown  in  Fig.  4,  whereby  H  and  G 
denote  analysis  and  synthesis  filter  bank  blocks  including 
decimation  and  expansion  as  given  in  Fig.  3.  The  system 
blocks  Ao  and  A]  are  diagonal  polynomial  matrices  repre¬ 
senting  independent  filters  within  each  of  the  K  subbands. 
As  the  FB  part  has  to  be  performed  at  symbol  rate,  the 
error  is  evaluated  based  on  the  FF  outputs  reconstructed 
by  G,  and  is  projected  back  into  the  subband  domain  to 
update  the  filters  in  Ao  and  A\. 

A  drawback  of  the  update  procedure  for  the  FF  part  is, 
that  the  error  signal  contains  a  transfer  path.  This  transfer 
path  can  be  approximated  by  a  delay  identical  to  Lv/N . 
This  delay  has  been  reported  to  result  in  degraded  conver¬ 
gence  speed  [11].  To  overcome  this  problem,  a  modification 
of  the  structure  I  architecture  will  be  introduced  by  inte¬ 
grating  the  FB  part  into  subbands. 


Figure  4:  Adaptive  equaliser  structure  I  with  the  FF  part 
in  subband. 


3.2.  Structure  II 

A  subband  equaliser  structure  II  is  shown  in  Fig.  5,  which 
has  the  aim  to  overcome  slow  convergence  due  to  the  error 
transfer  path  in  structure  I.  The  error  signal  is  now  formed 
in  the  subband  domain  and  can  be  used  to  delaylessly  up¬ 
date  both  the  FF  and  FB  parts.  Similarly  to  structure  I, 
B  is  of  diagonal  polynomial  form  holding  the  adaptive  FB 
filters  running  independently  within  each  subband. 

In  structure  II  architecture,  all  adaptive  filters  are  up¬ 
dated  by  the  immediately  formed  subband  errors  at  the 
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4.  SIMULATIONS  AND  RESULTS 


Figure  5:  Adaptive  equaliser  structure  II  with  both  the  FF 
and  FB  parts  in  subbands. 


same  time.  This  is  expected  to  provide  improved  conver¬ 
gence  characteristics  over  structure  I.  However,  as  the  er¬ 
ror  is  calculated  in  the  subband  domain,  this  structure  can 
only  be  used  in  training  mode.  The  decision  directed  learn¬ 
ing  mode  —  switch  position  2  in  the  fullband  structure  in 
Fig.  1  and  the  subband  structure  I  in  Fig.  4  —  requires  a 
non-linearity  that  cannot  be  transferred  into  the  subband 
domain.  Therefore,  if  decision  directed  mode  was  to  be 
performed,  structure  I  would  have  to  be  selected.  By  ap¬ 
propriate  subband  projections,  the  FB  filter  6[n]  in  Fig.  4 
can  be  reconstructed  from  B  in  Fig.  5. 

3.3.  Computational  Complexity 

The  complexity  of  a  fullband  equaliser  implementation  in 
terms  of  multiply-accumulates  (MACs)  when  using  an  NLMS 
algorithm  for  updating  is  approximately  given  by 

Cfuiiband  =  4  ■  2(Lff  +  Lfb)  =  8(Lff  +  Lfb)  (2) 

where  the  factor  of  4  accounts  for  the  required  complex  val¬ 
ued  arithmetic.  The  feedforward  and  feedback  filter  lengths 
are  represented  by  Lff  and  L fb,  respectively. 

For  our  subband  equaliser  implementations,  the  com¬ 
plexity  of  the  filter  banks  has  to  be  considered.  In  a  fast 
implementation,  one  analysis  or  synthesis  filter  bank  oper¬ 
ation  cost 

Cfilterbank  =  '  (2Lp  T  4ATlog2iF)  (3) 

MACs  per  fullband  sampling  period  [10]. 

Thus  the  complexity  of  subband  structure  I  with  the 
FF  part  in  subband  and  4  filter  bank  operations  is 

Csubband.I  =  •  2(Lff)  +  4  ■  2(Lfb)  +  4Cfilterbank-  (4) 

For  subband  structure  II,  we  require 

Csubband.II  =  jy4  •  2(LfF  +  Lfb)  +  5CfUterbank  (5) 

due  to  operating  both  FF  and  FB  parts  in  subbands  in  the 
structure  and  executing  5  filter  banks. 


The  channel  characteristic  in  Fig.  2  has  been  used  to  test 
the  fullband  and  subband  equalisers  introduced  in  Sec.  3. 
Quadrature  amplitude  modulation  (QAM)  signals  are  used 
in  our  simulation.  A  normalised  least  mean  square  (NLMS) 
algorithm  is  employed  for  adaptation  of  the  fullband  and 
subband  structure  II  adaptive  filters,  while  a  delay-NLMS 
is  used  in  subband  structure  I.  The  normalised  step  size 
of  jx  =  0.4  is  set  for  all  equaliser  structures.  The  delay 
A  for  the  different  systems  is  set  such  that  the  FF  part 
targets  almost  only  the  pre-cursor,  while  the  FB  part  of 
the  equaliser  eliminates  the  post-cursor.  For  the  subband 
structures,  the  OSFBs  split  the  fullband  signal  into  K  =  16 
channels  decimated  by  N  =  14,  with  Lp  =  448. 

The  filter  length  of  the  subband  equalisers  is  selected 
according  on  (1).  The  number  of  coefficients  of  the  different 
structures  —  Lff  refers  to  the  filter  in  the  FF  part,  and  Lfb 
to  the  FB  part  of  the  equaliser  —  is  listed  in  Tab.  1. 


Table  1:  Number  of  coefficients  in  the  FF  and  FB  parts  of 
the  different  simulated  equaliser  structures. 


The  performance  of  the  three  —  fullband,  and  subband 
structure  I  and  II  —  equaliser  systems  is  assessed  in  terms 
of  achieved  mean  squared  error  (MSE)  and  bit  error  rate 
(BER),  whereby  both  the  learning  characteristic  as  well  as 
the  steady  state  are  of  interest. 

4.1.  Convergence  Behaviour 

The  MSE  learning  characteristic  of  the  three  systems  is  pre¬ 
sented  in  Fig.  6.  The  curves  are  averaged  over  an  ensem¬ 
ble  of  25  runs  with  a  random  64-QAM  input  signal  u[n\ 
in  the  absence  of  channel  noise.  In  terms  of  convergence 
rate,  the  subband  structures  exhibit  a  convergence  speed 
that  is  approximately  twice  as  fast  as  the  fullband  equaliser. 
Whereby  subband  structure  II  attains  a  faster  initial  MSE 
convergence  performance  over  structure  I.  It  is  indicative 
that  both  subband  structure  I  and  II  attain  a  considerably 
better  steady-state  error  performance  than  the  fullband  sys¬ 
tem. 

4.2.  Bit  Error  Rates 

We  further  examine  the  performance  of  the  fullband  equa¬ 
liser  and  subband  structure  II  in  terms  of  BER  for  various 
levels  of  QAM  over  the  previous  channel,  which  now  is  dis¬ 
turbed  by  noise  at  variable  SNR.  The  noise  is  independent 
of  the  transmitted  signal.  An  additive  white  Gaussian  noise 
is  coloured  by  the  receive  filter.  The  BER  performance  re¬ 
sults  for  4-,  16-,  64-,  and  256-QAM  over  variable  SNR  are 
shown  in  Fig.  7.  The  displayed  BER  values  are  taken  for 
the  steady-state  case  after  adapting  the  equalisers  for  5-105 
symbol  periods.  In  general,  the  fullband  equaliser  is  supe¬ 
rior  particularly  for  lower  modulation  levels  at  low  SNR. 


Figure  6:  MSE  performance  for  fullband  and  suhband 
(structure  I  and  II)  equalisers  for  a  noise  free  channel. 


Figure  7:  BER.  performance  of  fullband  (FB)  and  subhand 
(SB)  structure  II  over  variable  channel  SNR  for  various 
modulation  levels. 


A  clear  advantage  for  the  steady-state  performance  of  the 
subband  structure  can  be  noted  for  higher  QAM  levels  (64- 
QAM  and  256-QAM)  at  higher  SNR  above  25  dB. 


4.3.  Computational  Cost  Comparison 

The  filter  lengths  of  the  proposed  subband  structures  —  se¬ 
lected  according  to  (1)  —  are  given  in  Tab.  1.  These  filter 
lengths  have  been  set  to  achieve  similar  modelling  capabil¬ 
ities  of  the  fullband  and  different  subband  structures.  The 
computational  complexity  of  the  equaliser  structures  —  cal¬ 
culated  according  to  (2),  (4),  and  (5)  —  are  displayed  in  the 
second  column  of  Tab.  2.  The  third  column  in  Tab.  2  rep¬ 
resents  the  computational  cost  comparison  for  the  subband 
equalisers  implementations  compared  to  the  fullband  real¬ 
isation.  Subband  structure  I  and  II  only  require  39%  and 
29%,  respectively,  of  the  fullband  equaliser’s  computational 
complexity. 


Equaliser  structure 

MAC'S 

%  of  Fullband 

Fullband 

4800 

100%, 

Structure  I 

1882 

39%, 

Structure  II 

1416 

29% 

Table  2:  Computational  cost  comparison  for  different 
equaliser  structures. 


5.  CONCLUSIONS 

This  paper  has  introduced  structures  for  subband  adaptive 
equalisation  and  presented  some  simulation  results.  An  im¬ 
portant  indication  from  these  results  is  that  for  severely 
distorting  channels  subband  equalisers  can  attain  a  faster 
convergence  rate  and  better  steady-state  error  than  their 
fullband  counterpart  with  a  gain  in  BER  for  high  SNR 
when  operating  in  higher  level  QAM  modes.  The  subband 
equalisers  were  implemented  at  a  reduced  computational 
cost  compared  to  the  fullband  system. 
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ABSTRACT 

We  present  a  technique  for  simulating  time-varying  mobile 
radio  channels.  This  technique  is  specifically  suited  to  the 
small  relative  Doppler  bandwidths  of  wideband  channels  en¬ 
countered  in  CDMA  and  OFDM  communications.  A  “sub¬ 
sampled”  ARMA  innovations  filter  and  multistage  interpola¬ 
tion  are  used  to  achieve  an  accurate  and  computationally  ef¬ 
ficient  approximation  of  specified  or  measured  Doppler  spec¬ 
tra  (scattering  functions).  We  discuss  the  calculation  of  the 
ARMA  coefficients  and  the  optimal  design  of  the  multistage 
interpolator.  Simulation  results  demonstrate  the  excellent 
performance  of  the  proposed  channel  simulator. 

1.  INTRODUCTION 

Computer  simulation  of  mobile  radio  channels  is  of  great  im¬ 
portance  for  the  development  and  evaluation  of  mobile  com¬ 
munications  systems.  A  discrete-time  channel  model  that  is 
convenient  for  channel  simulation  is  the  time- varying  tapped 
delay  line  (FIR  filter)  with  input-output  relation  [1,2] 

JVf— 1 

y[n]  -  22  hm[n]x[n-m]. 

m= 0 

Here,  x[n]  is  the  channel  input  signal,  y[n\  is  the  channel 
output  signal,  hm[n]  is  the  channel’s  time- varying  impulse 
response  (with  m  the  delay  index  and  n  the  time  index), 
and  M— 1  is  the  maximum  delay.  For  wide-sense  stationary 
uncorrelated  scattering  (WSSUS)  channels,  each  tap  weight 
sequence  hm  [ n ]  is  a  stationary  random  process  with  autocor¬ 
relation  function  rm[l]  =  E{hm[n  +  /]  h*rn [n] } ,  and  different 
tap  weight  processes  hm[n],  hm>[n]  are  uncorrelated  [1,2]. 
The  power  spectra  of  the  hm[n], 

OO 

Sm{v)  —  22  rm[l]  e~j2n>'1 ,  m  =  0, 1,  •  •  -,M— 1 , 

l  =  —  OO 

are  termed  the  channel’s  Doppler  spectra  or  scattering  func¬ 
tion  [1,2].  Here,  v  is  the  Doppler  frequency  normalized  by 
the  sampling  frequency.  For  wideband  CDMA  and  OFDM 
systems,  the  sampling  frequency  is  significantly  higher  than 
the  channel’s  maximum  Doppler  shifts.  This  results  in  ex¬ 
tremely  small  Doppler  bandwidths,  i.e.,  the  Doppler  spectra 
Sm  (v)  have  extremely  narrowband  lowpass  characteristics. 

From  the  discussion  above,  it  follows  that  the  simula¬ 
tion  of  a  WSSUS  channel  amounts  to  generating  realizations 
of  M  uncorrelated,  stationary  tap  processes  hm[n]  ( m  = 

‘This  work  was  supported  by  FWF  grant  PI  1904-TEC. 
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Figure  1:  Structure  of  the  proposed  channel  simulator: 
Generating  a  realization  of  a  tap  weight  process  hm[n\. 

0, 1,  •  •  •,  M— 1)  whose  second-order  statistics  should  conform 
to  the  specified  rm[Z]  or,  equivalently,  Sm( v)-  In  this  paper, 
we  consider  a  tap  process  generator  based  on  an  autoregres¬ 
sive  moving-average  (ARMA)  innovations  filter  [3,  4]  that 
is  driven  by  stationary  white  Gaussian  noise.  To  avoid  the 
high  ARMA  model  order  that  would  normally  be  needed 
for  achieving  the  small  relative  Doppler  bandwidths  encoun¬ 
tered  in  wideband  CDMA  and  OFDM  systems,  we  propose 
a  “subsampled”  ARMA  innovations  filter  that  is  designed 
using  a  subsampled  autocorrelation 

r'm[n ]  =  rm[nL\.  (1) 

To  compensate  for  the  subsampling,  the  ARMA  filter  is  fol¬ 
lowed  by  a  multistage  interpolator  [5]  for  which  we  propose 
an  MSE-optimal  design.  A  block  diagram  of  the  resulting 
tap  process  generator  is  shown  in  Fig.  1.  The  generic  struc¬ 
ture  of  a  channel  simulator  consisting  of  innovations  filters 
and  interpolators  was  previously  considered  in  [6]. 

The  proposed  channel  simulator  has  numerous  advantages 
over  other  channel  simulation  techniques  [1,  7-10]:  the 
ARMA  modeling  approach  allows  accurate  approximation 
of  arbitrary  Doppler  spectra;  arbitrarily  long  tap  sequences 
can  be  generated  online;  new  realizations  are  generated  in 
each  simulation  run;  the  simulated  channel  is  guaranteed  to 
be  Rayleigh  fading;  and  finally,  the  multistage  interpolator 
allows  for  efficient  implementation,  simplified  interpolator 
filter  design,  and  easy  adjustment  of  the  Doppler  bandwidth 
without  modification  of  the  ARMA  filter. 

The  rest  of  the  paper  is  organized  as  follows.  Section 
2  discusses  the  subsampled  ARMA  innovations  filter  and 
presents  methods  for  calculating  the  filter  coefficients.  Sec¬ 
tion  3  considers  the  multistage  interpolator  and  its  optimal 
design.  Finally,  simulation  results  are  provided  in  Section  4. 

2.  ARMA  INNOVATIONS  FILTER 

The  subsampled  ARMA  innovations  filter  is  an  HR  filter 
described  by  the  difference  equation  [3] 

p  Q 

222  °[&]  h'[n— k]  =  22  w[n~ fc] >  with  a[0]  =  1,  (2) 

k=0  k=0 
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where  a[fc]  and  b[k]  are  the  autoregressive  (AR)  and  moving- 
average  (MA)  coefficients,  respectively,  P  and  Q  are  the 
AR  and  MA  model  orders,  respectively,  h'[n]  is  the  filter 
output  (the  subsampled  tap  process;  note  that  we  suppress 
the  subscript  m  in  h'm[n]  etc.),  and  >/,'[«]  is  the  filter  input 
that  is  chosen  as  white  Gaussian  noise.  Multiplying  (2)  by 
h'*  [n  —  1}  and  taking  expectations  yields  [3] 

p  Q 

J2a[k]r'[l-k]  =  Vb[fc]c*[fc-f],  feZ,  (3) 

k=0  k= 0 


where  r'[n]  =  r[nL ]  is  the  subsampled  autocorrelation  and 
c[n]  is  the  impulse  response  of  the  AR.MA  filter.  This  is  a 
nonlinear  equation  in  the  ARMA  coefficients  and  b[k] 
since  c[n]  depends  on  a[/c]  and  b[k]. 

In  the  frequency  domain,  (3)  becomes 


S'(i') 


B{y) 

A{u) 


C» 


LgM! 

\A(v)\2  ’ 


(4) 


where  S'(;z),  A(v),  B(v),  and  C{v)  -  B(v)/A(v)  are  the 
Fourier  transforms  of  r'[n],  o[n],  b[n],  and  c[n],  respectively. 

For  calculation  of  the  AR  and  MA  coefficients,  the  speci¬ 
fied  (subsampled)  autocorrelation  and  Doppler  spectrum  are 
substituted  for  r'[n]  in  (3)  and  for  S'(i')  in  (4),  respectively. 
Typically,  the  ARMA  model  will  only  provide  an  approxi¬ 
mation  to  the  specified  ?•'[«]  and  S'  (i/),  and  thus  (3)  and  (4) 
will  be  satisfied  only  approximately. 


2.1.  Calculation  of  the  AR  Coefficients 

Usually,  the  AR  coefficients  o,[n]  are  estimated  from  higher- 
lag  (and,  thus,  smaller)  values  of  r'[n]  that  are  not  influenced 
by  the  MA  model  part  [3].  However,  here  we  include  the 
central  (largest)  values  of  r'[n]  since  we  observed  this  to  yield 
better  accuracy  of  the  overall  ARMA  approximation.  This 
approach  means  that  we  first  fit  an  AR  filter  and  then  fit  an 
MA  filter  to  the  resulting  residual  autocorrelation.  Formally 
setting  Q  =  0  and  noting  that  the  AR.MA  filter  impulse 
response  c[n ]  is  causal,  (3)  for  l  =  1,2,---,  Af  together  with 


o[0]  =  1  yields  the  Yule- Walker  equation  [3] 

Ra  =  — r  , 

where 

r'[0]  r'[-l]  .. 

.  r'[— P+lf 

R  = 

r'[ 1]  r'[0]  .. 

.  r'[— P+2] 

r'[2V— 1]  r'[JV— 2]  .. 

.  r'[Ar-P]_ 

a  =  [a[l]  •  •  •  a[P]]T,  and  r  =  [r'[l]  ■  •  •  r'[Ar]] T,  with  N  the 
number  of  samples  of  r'[n\  that  are  used  for  the  calculation. 
Equation  (5)  is  a  system  of  N  linear  equations  in  the  P 
unknowns  a[n\.  For  N  >  P,  the  least-squares  solution  of  (5) 
(stabilized  by  diagonal  loading  [11,  Chap.  7.4])  is  given  by 

a  =  —  (R//R)_1R//r ,  with  R  =  R  +  7I. 

Here,  7  is  a  suitable  loading  parameter  ensuring  that  all 
poles  of  the  AR  filter  are  inside  the  unit  circle.  For  good 
results,  N  must  be  chosen  much  larger  than  P. 

Criteria  for  selecting  the  AR  model  order  P  are  discussed 
in  [3].  However,  it  is  also  noted  in  [3]  that  these  criteria  seem 
to  work  well  only  for  a  true  AR  process.  For  a  Doppler  band¬ 
width  of  10-4  and  subsampling  factor  L  =  1000,  we  obtained 
good  results  with  P  —  10  ■  ■  ■  50  and  N  =  3P-  ■  ■  10P. 


2.2.  Calculation  of  the  MA  Coefficients 

Once  the  AR  coefficients  a [71]  have  been  determined,  we  can 
proceed  to  calculate  the  MA  coefficients  f»[n].  With  Durbin’s 
method  [3,4],  the  MA  modeling  problem  is  transformed  into 
two  AR  modeling  problems  of  which  one  has  significantly 
higher  order  than  the  MA  model  order  Q.  However,  since 
in  our  case  Q  =  100-  ■  1000  to  ensure  good  approximation 
accuracy,  Durbin’s  method  would  be  extremely  expensive. 
Therefore,  here  we  propose  a  modified  version  of  the  ex¬ 
tended  Prony  method  described  in  [3].  Our  method  is  also 
related  to  the  Blackman-Tukey  spectral  estimator  [4]. 
Equation  (4)  can  be  rewritten  as 

|P(n)|2=  \A(u)\2S'(u).  (6) 

Basically,  6[n]  can  be  obtained  by  causal  factorization  of 
|P(i/)|2.  In  the  time  domain,  (6)  reads 

0[n\  =  o[n]  *  /[«] ,  (7) 

with  o[n]  =  a[n]  *«*[— n]  and  f3\v]  —  6[n]*6*[-n].  But  from 
fl[v\  —  6[n]  *  h*[— n],  it  follows  that  fi[v]  should  have  finite 
support  [— Q,  Q]  and  a  nonnegative  real  Fourier  transform. 
Therefore,  we  “correct”  (7)  by  applying  a  Bartlett  window 
to  the  right-hand  side, 

f(a[n]*t>])(l-^),  \n\<Q 

m  i  \0,  M  >  Q • 

This  enforces  both  finite  support  [-Q,  Q]  and  a  nonnegative 
real  Fourier  transform  (since  the  Fourier  transforms  of  both 
o[n]*r'[n]  and  1  —  ^7  are  nonnegative  real).  Finally,  fc[n]  is 
obtained  by  causal  factorization  of  [12,  App.  A].  To  this 
end,  the  cepstrum  of  (3[n\  in  (8)  is  calculated  [13].  Trans¬ 
forming  the  cepstrum’s  causal  part  back  into  the  original 
domain  yields  the  MA  coefficients  6[n],  n  =  0, 1,--  -,Q.  The 
overall  technique  was  observed  to  produce  similar  results 
as  Durbin’s  method  (see  Section  4)  at  significantly  reduced 
computational  complexity. 

Criteria  for  selecting  the  MA  model  order  Q  are  discussed 
in  [3].  We  found  that  for  good  approximation  accuracy, 
the  MA  filter  should  have  the  same  length  as  the  central 
(dominant)  part  of  the  subsampled  autocorrelation  r'[?i]. 
For  a  Doppler  bandwidth  of  10-4  and  subsampling  factor 
L  =  1000,  we  obtained  good  results  with  Q  —  100  •  •  •  1000. 

3.  MULTISTAGE  INTERPOLATOR 

To  compensate  for  the  subsampling  of  r[?i]  in  (1),  the  out¬ 
put  h'[n]  of  the  subsampled  ARMA  filter  is  interpolated  by 
the  subsampling  factor  L.  If  L  is  chosen  as  a  composite 
number,  i.e.,  L  —  n^Jo'  Lk-,  a  particularly  efficient  multi¬ 
stage  interpolator  [5]  can  be  used.  Here,  interpolation  by  L 
is  performed  by  K  successive  interpolator  stages  with  inter¬ 
polation  factors  Lk .  Each  interpolator  stage  is  represented 
using  a  polyphase  decomposition.  The  input-output  rela¬ 
tions  of  the  individual  interpolator  stages  are  [5] 

ft(t+1)[n]  =  k  =  0, 1,  -  - -,  AT— 1 

(where  [£J  denotes  the  largest  integer  <  £),  with 
17.-1 

«!*’[«]  =  J2  *  =  0, 1,  •  •  •,  Lfc  — 1 .  (9) 

i=-n 
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Here,  fc^[n]  and  fc^+1^[n]  are  the  input  and  output,  respec¬ 
tively,  of  the  fcth  interpolator  stage  (in  particular,  hm  [n]  = 
h'[n]  and  /i^[n]  =  h[n],  cf.  Fig.  1),  and  and  uf\n] 

are  the  impulse  response  (of  length  214)  and  output  se¬ 
quence,  respectively,  of  the  ith  polyphase  filter  of  the  fcth 
interpolator  stage. 

We  propose  a  mean  square  error  (MSE)  optimal  design  of 
the  multistage  interpolator  that  is  analogous  to  the  “deter¬ 
ministic  MSE  design”  described  in  [5,  Sec.  4.3.6].  The  fcth 
interpolator  stage  is  designed  such  that  the  MSE 

MSE(fr)  =  E{|/i(t+1)[n]  -h(k+1)[n}\2} 

is  minimized.  Here,  h(fc+1)[n]  is  the  output  of  an  ideal  low- 
pass  interpolator  with  appropriate  cutoff  frequency. 

The  output  signals  of  the  polyphase  branches  within  the 
fcth  interpolator  stage  are  nonoverlapping  in  time  and  thus 
orthogonal.  Hence,  MSE(fc)  =  MSE|fc)  with 

MSE«  =  E{|ufV]-fi<fc)H|2}, 

where  uf^n]  =  hW[n  -  1}  (cf.  (9))  is  the 

output  signal  of  an  ideal  interpolator  polyphase  filter  with 
transfer  function  P-k\v)  =  e^lv/Lk  [5].  This  means  that 
the  individual  polyphase  filters  p\k)[n]  can  be  designed  in¬ 
dependently  by  separately  minimizing  the  MSE  components 
MSE;fc4  It  is  easily  verified  that  MSE[fc)  can  be  expressed 
in  the  frequency  domain  as 

/1/2 

I  p}k\u)  -  Plk)(v)f  S(k\v)  dv.  (10) 

-1/2 

Here,  Sw(v}  =  E“=-oc  rm{n]  with  r^[n]  = 

r[nlk],  Ik  =  n-E'  L  i,  is  the  Doppler  spectrum  of  the  input 
process  of  the  fcth  interpolator  stage.  Inserting  P\k\v)  — 

'Ln'L~-vkPik) and  Pt(k\v)  =  e^iv!Lk  into  (10) 
and  setting  the  derivatives  of  the  resulting  expression  with 
respect  to  p/' *’[«,]  (n  =  — 14,  •  •  •,  14  —  1)  equal  to  zero  [3], 
we  obtain  the  following  equations  for  the  MSE-optimal  in¬ 
terpolator  coefficients  p\k^[n], 

vk~i 

^2p\k)[l]r{k)[n-l]  =  r(k+1)[nLk+i\,  n  =  -T4,*- -,14-1. 
i=-vk 

This  can  be  written  as 

RWpW=rf+1),  (11) 

with  the  vectors  pf )  =  [p-fc)[-14]  •  •  •  pf ]  [14  - 1]] T,  rf +1)  = 
[r(fc+i )[ — VfeX/fc -f-z]  •  •  •  j4fc+1)[(T4  —  l)Lk  +  i]]T  and  the  Hermi- 
tian  Toeplitz  matrix  with  first  row  [r(fe)[0]  •  •  •  2Vk 
+1]] .  Since  R^  does  not  depend  on  i,  (11)  is  best  solved  by 
explicitly  inverting  R(fc)  using  the  Levinson  algorithm  [3,4], 
which  has  to  be  done  only  once  per  interpolator  stage  fc. 

The  MSE-optimal  multistage  interpolator  explicitly  de¬ 
pends  on  the  exact  shape  of  the  specified  Doppler  spectrum 
S(v).  If  this  dependence  is  undesired,  one  may  employ  a 
suboptimal  default  design  using  a  rectangular  S(v)  that  is 
constant  within  the  Doppler  bandwidth  and  zero  outside. 


(a)  (b) 

Figure  2:  Simulation  of  a  single-tap  channel:  (a)  Speci¬ 
fied  (Jakes)  Doppler  spectrum,  simulated  Doppler  spectrum, 
and  simulated  Doppler  spectrum  obtained  without  subsam¬ 
pling/interpolation;  (b)  simulated  Doppler  spectrum  obtained 
at  the  output  of  the  second  interpolator  stage. 


Figure  3:  Squared  magnitude  of  the  frequency  responses  of 
the  subsampled  ARMA  filter  obtained  with  the  proposed  MA 
design  method  and  with  Durbin’s  m.ethod. 


4.  SIMULATION  RESULTS 


4.1.  Simulation  1:  Jakes  Doppler  Spectrum 


Fig.  2  and  Fig.  3  consider  the  simulation  of  a  single-tap 
channel  (M  =  1)  with  a  Jakes  Doppler  spectrum  [1] 


SoH 


{ 
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\v\  <  r'max, 
else. 


The  relative  Doppler  bandwidth  was  ['max  =  10-4  (corre¬ 
sponding,  e.g.,  to  an  absolute  Doppler  bandwidth  of  100  Hz 
when  a  sampling  frequency  of  1  MHz  is  used).  The  subsam¬ 
pling/interpolation  factor  was  chosen  as  L  =  1000.  An 
ARMA  filter  of  order  P  =  20,  Q  =  100  was  designed  using 
parameters  N  =  200  and  7  =  6-  10-10.  The  multistage  in¬ 
terpolator  comprised  3  stages  with  Lo  =  5,  L\  =  40,  L->  =  5 
and  polyphase  filter  lengths  Vo  =  5,  Vi  =  2,  V2  =  10.  It  was 
designed  using  a  rectangular  default  Doppler  spectrum. 

Fig.  2(a)  shows  the  specified  (Jakes)  Doppler  spectrum 
5o([z)  of  the  tap  process  ho[n]  as  well  as  an  estimate  of 
the  Doppler  spectrum  of  the  simulated  channel  (estimated 
from  200  realizations  of  /io[«]  with  length  106  each.)  It  can 
be  seen  that  the  approximation  is  very  accurate.  Fig.  2(a) 
also  shows  the  estimated  Doppler  spectrum  obtained  with  an 
ARMA  channel  simulator  that  used  the  same  ARMA  model 
order  ( P  =  20,  Q  =  100)  but  no  subsampling/interpolation. 
It  is  seen  that  for  a  given  ARMA  model  order,  the  sub- 
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Figure  4:  Simulation  of  a  realistic  channel:  (a)  Specified  scattering  function  (estimated  from,  measured  channel  data);  (b) 
simulated  scattering  function  (estimated  from,  realizations  of  the  sim.ulat.ed  impulse  response);  (c)  magnitude  of  a  realization 
of  the  simulated  impulse  response. 


sampling/interpolation  technique  yields  a  substantial  perfor¬ 
mance  improvement.  (In  fact,  without  subsampling/inter¬ 
polation,  an  MA  order  of  Q  =  104  ■  •  •  105  would  be  needed 
to  obtain  a  comparable  performance.) 

Fig.  2(b)  shows  an  estimate  of  the  Doppler  spectrum  of 
the  simulated  tap  process  at  the  output  of  the  second  in¬ 
terpolator  stage  (i.e.,  before  the  final  interpolator  stage). 
Since  the  interpolation  factor  of  the  final  interpolator  stage 
is  Z/2  =  5,  the  tap  process  after  the  second  interpolator  stage 
has  Doppler  bandwidth  Lovm!,%  =  5  •  HP1 .  This  shows  that 
it  is  possible  to  simultaneously  generate  channels  with  equal 
Doppler  profile  but  different  Doppler  bandwidths. 

Fig.  3  compares  the  frequency  responses  of  the  subsampled 
ARMA  filter  obtained  with  our  MA  design  method  (see  Sub¬ 
section  2.2)  and  with  Durbin’s  method  [3,4],  Identical  AR 
coefficients  were  used.  Note  that  these  frequency  responses 
do  not  include  the  interpolator;  the  Doppler  bandwidth  of 
0.1  in  Fig.  3  corresponds  to  a  Doppler  bandwidth  of  10-'1 
after  interpolation  by  L  =  1000.  It  can  be  seen  that  our  effi¬ 
cient  method  achieves  slightly  better  stop-band  attenuation 
than  Durbin’s  method. 

4.2.  Simulation  2:  Realistic  Channel 

Fig.  4(a)  shows  a  specified  scattering  function  that  was  esti¬ 
mated  [14]  from  channel  data  measured  in  a  suburban  area.1 
To  each  one  of  the  M  =  6  specified  Doppler  spectra  Sm(v), 
m  =  0,1,  ■••,5,  we  designed  a  corresponding  subsampled 
ARMA  filter  of  order  P  =  50  and  Q  =  1000  using  parame¬ 
ters  N  =  500  and  7  =  10-5.  The  subsampling/interpolator 
parameters  were  as  in  Subsection  4.1.  Fig.  4(b)  shows  an  es¬ 
timate  of  the  scattering  function  Sw  {y)  derived  from  200  re¬ 
alizations  of  the  simulated  impulse  response  (tap  processes) 
hm[n],  m  =  0, 1,  •  -  -,  5,  each  of  length  5  •  105.  It  is  seen  that 
the  channel  simulator  achieves  a  good  approximation  of  the 
specified  scattering  function.  Finally,  a  segment  of  a  simu¬ 
lated  impulse  response  hm[n]  is  shown  in  Fig.  4(c). 

5.  CONCLUSIONS 

We  have  presented  a  technique  for  simulating  time-varying 
mobile  radio  channels  that  is  specifically  suited  to  the  small 

1  Courtesy  of  T-Nova  Deutsche  Telekom  Innovationsgesell- 
schaft  mbH,  Technologiezentrum  Darmstadt,  Germany. 


relative  Doppler  bandwidths  encountered  in  wideband 
CDMA  and  OFDM  communications.  The  combination  of 
a  “subsampled’’  ARMA  innovations  filter  with  a  multistage 
interpolator  was  shown  to  yield  substantial  advantages  re¬ 
garding  accuracy,  efficiency,  and  flexibility. 
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ABSTRACT 

This  paper  addresses  the  problem  of  estimating  a  rapidly  fading 
convolutionally  coded  signal  such  as  might  be  found  in  a  wireless 
telephony  or  data  network.  We  model  both  the  channel  gain  and 
the  convolutionally  coded  signal  as  Markov  processes,  and  thus 
the  noisy  received  signal  as  a  hidden  Markov  process  (HMP).  Two 
now-classical  methods  for  estimating  finite-state  hidden  Markov 
processes  are  the  Viterbi  algorithm  and  the  a  posteriori  probabil¬ 
ity  ( APP)  filter.  A  hybrid  recursive  estimation  procedure  is  derived 
whereby  one  hidden  process  (the  encoder  state  in  our  application) 
is  estimated  using  a  Viterbi-type  (ie  sequence  based)  cost  and  the 
other  (the  fading  process)  using  an  APP  based  cost  such  as  max¬ 
imum  a  posteriori  probability.  Using  simulations,  performance 
of  the  optimal  scheme  is  compared  with  a  number  of  suboptimal 
techniques  -  decision  directed  Kalman  and  HMP  predictors,  and 
Kalman  filter  and  HMP  filter  per-survivour  processing  (PSP)  tech¬ 
niques.  Superior  performance  of  the  optimal  scheme  is  demon¬ 
strated  with  little  extra  computational  requirement  compared  to  the 
PSP  techniques. 

1.  INTRODUCTION 

In  wireless  telephony  and  data  networks,  propagation  characteris¬ 
tics  of  the  radio  channel  give  rise  to  often  rapid  fluctuations  in  the 
received  signal  power  [1],  For  multilevel  signalling  constellations 
such  as  Pulse  Amplitude  Modulation  (PAM)  and  Quadrature  Am¬ 
plitude  Modulation  (QAM),  it  is  necessary  for  the  receiver  to  have 
a  good  estimate  of  the  instantaneous  channel  power  gain  in  order 
to  properly  demodulate  the  signal.  For  many  practical  channels, 
the  channel  power  gain  may  vary  so  quickly,  that  gain  estimation 
methods  based  on  a  static  model  of  the  channel  gain  (eg  adaptive 
methods,  maximum  likelihood)  may  not  track  sufficiently  quickly 
to  permit  demodulation  of  the  signal.  Thus  dynamic  models  for 
the  channel  gain  should  be  applied  in  such  cases.  Dynamic  mod¬ 
els  will  give  rise  to  estimation  structures  which  are  designed  to 
track  more  quickly,  and  thus  should  improve  performance.  In  this 
paper,  we  specify  a  finite  state  Markov  chain  to  model  the  ampli¬ 
tude  gain  process. 

Most  wireless  telecommunications  signals  employ  forward  error 
correction  (FEC)  at  the  physical  layer  to  give  protection  against 
symbol  errors  introduced  by  noise  on  the  channel.  The  most  com¬ 
mon  type  of  FEC  is  convolutional  coding  [2].  Convolutional  cod¬ 
ing  works  by  adding  redundancy  (linear  dependence)  into  the  trans¬ 
mitted  symbol  stream  by  multiple  input  -  multiple  output  linear 
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FIR  filtering  (modulo  2).  The  maximum  delay  in  the  filter  is  called 
the  constraint  length  of  the  encoder.  An  encoder  which  produces 
n  output  bits  for  each  m  input  bits  is  called  a  rate  m/n  encoder. 
Commonly  used  rates  are  1/2,  3/4,  5/6  and  7/8,  however  for  some 
applications  (eg  deep  space  communications)  rates  as  low  as  1/128 
might  be  used.  In  this  paper,  we  consider  only  rate  1/n,  n  >  2  en¬ 
coding.  A  convolutionally  encoded  signal  may  be  represented  as  a 
hidden  Markov  model  (HMM)  with  state  consisting  of  all  the  in¬ 
put  bits  stored  in  the  encoder  memory,  and  observation  consisting 
of  the  output  symbol  stream.  The  transition  structure  of  the  state 
is  highly  constrained.  For  example  for  a  rate  1/2  encoder  of  con¬ 
straint  length  M  has  2M  states  (corresponding  to  all  possible  com¬ 
binations  of  the  M  stored  input  bits  in  the  encoder),  but  there  are 
only  2  possible  transitions  from  each  state,  corresponding  to  the  2 
possibilities  for  the  next  input  bit.  In  such  highly  constrained  prob¬ 
lems,  it  is  recognised  that  Maximum  Likelihood  Sequence  Estima¬ 
tion  (MLSE)  should  be  used,  leading  to  the  well-known  Viterbi 
algorithm  (VA)  [4],  where  it  is  demonstrated  that  MLSE  yields 
(asymptotically)  the  optimal  error  performance. 

In  this  paper  we  model  our  received  signal  as  the  product  of  the 
channel  gain  process  and  the  convolutionally  encoded  process  ob¬ 
served  in  additive  white  Gaussian  noise.  Thus  we  have  an  HMP 
dependent  on  two  underlying  Markov  chains,  one  being  the  state 
of  the  convolutional  encoder,  and  the  other  being  the  state  of  the 
channel  gain  process.  We  derive  a  optimal  mixed  estimation  al¬ 
gorithm,  whereby  we  seek  MLSE  for  the  encoder  state,  and  max¬ 
imum  a  posteriori  probability  (MAP)  estimates  for  the  channel 
gain  process.  Such  an  algorithm  clearly  involves  joint  estimation 
of  both  underlying  Markov  process  states,  albeit  with  different  cri¬ 
teria  used  to  determine  each  component.  The  MLSE  for  the  en¬ 
coder  then  allows  us  to  extract  the  original  input  bit  sequence. 

As  a  comparison,  we  use  two  classes  of  suboptimal  approaches. 
The  simplest  class  is  a  decoupled  structure  consisting  of  an  es¬ 
timator  for  the  channel  gain  process,  combined  with  a  standard 
MLSE  algorithm  applied  to  estimate  the  encoder  state.  This  struc¬ 
ture  mimics  in  some  sense  the  usual  automatic  gain  control  (AGC) 
commonly  used  in  receivers.  Decision  feedback  of  delayed  sym¬ 
bols  is  used  to  parameterise  the  channel  gain  estimator.  The  other 
suboptimal  methods  used  are  based  on  Per-survivour  Processing 
(PSP)  [8].  Here  a  bank  of  amplitude  estimators  are  used  ;  each  as¬ 
sociated  with  a  surviving  candidate  optimal  path  from  the  MLSE. 
There  is  no  requirement  for  feedback  of  delayed  (or  otherwise) 
symbols  with  these  PSP  methods.  Within  each  class,  we  inves- 
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tigate  the  performance  of  2  types  of  amplitude  estimators.  The 
first  class  is  based  on  an  AR(1)  model  for  the  amplitude  process, 
and  results  in  a  Kalman  filter  based  amplitude  estimator.  The  same 
AR(1)  model  is  used  to  derive  the  Kalman  filter  based  PSP  method 
similar  to  [9]  (which  also  addresses  the  frequency  selective  fad¬ 
ing  case).  In  each  case,  the  second  order  statistics  of  the  Markov 
chain  amplitude  process  are  used  to  parameterise  the  Kalman  fil- 
ter(s).  The  other  type  of  estimator  uses  the  finite  state  Markov 
chain  model  itself  to  derive  the  corresponding  HMP  filter(s)  for 
the  amplitude  process  in  both  decision  feedback  and  PSP  modes 
of  operation.  Performance  of  the  optimal  and  the  4  suboptimal 
techniques  is  compared  with  the  aid  of  simulated  4  level  Pulse 
Amplitude  Modulated  (PAM)  signals. 


such  a  model  in  [7].  An  additional  reason  for  such  a  choice  is 
the  applicability  of  an  estimation  theory  based  on  the  Expectation- 
Maximisation  algorithm  [3]  which  we  address  in  forthcoming  work 
[6).  Write  n  =  (n  i , . . . ,  njVl  i  >  )T  G  RA  < "  and  suppose  the  chain 
determining  the  fading  dynamics  is  A' f  1  ’  =  {A^.11}  where  A(n  ’ 
takes  values  in  the  (canonical)  set  of  unit  vectors 


Then  the  real  value  (gain)  associated  with  the  fading  channel  at 
time  f  is  (X[>\a).  We  suppose  the  transition  matrix  ,4 ( 1  ■’  = 
(ci'J  ’ )  of  A(1)  has  entries 


2.  SIGNAL  AND  CHANNEL  MODEL 

We  will  consider  convolutionally  coded  signals  with  constraint 
length  M.  Denote  by  A';,  the  length  M  binary  vector  being  the 
convolutional  encoder  state  at  sample  time  k.  This  process  follows 
a  ‘shift-register’  type  behaviour  so  that  for  k  >  0, 

Ajt+i  =  S  Xk  +ei  bk+ 1-  (1) 

Here  S  is  the  M  x  M  shift  matrix  with  S(J  =  1  if  i  =  j  +  1, 
and  zero  otherwise,  and  ei  is  the  unit  vector  in  RA?  with  unity 
in  the  first  position.  The  sequence  {bk  }  denotes  the  input  binary 
message  stream  which  is  independent  and  takes  the  values  0  and 
1  with  equal  probability.  Consequently,  the  state  space  of  A  has 
2a/  =  Nr2>  binary  vectors.  This  state  space  can  be  identified 

with  the  set  {e(j2), . . . ,  e(2ll2l }  of  unit  vectors  in  R'v<‘> .  We  shall 
write  X(2)  for  the  version  of  X  defined  in  the  canonical  space 
{e(!2\  . . . ,  e(2j2)  }.  Each  basis  vector  e|2)  corresponds  to  one  bi¬ 
nary  vector  in  {0, 1}M .  Each  binary  vector  corresponds  to  a  deci¬ 
mal  integer,  so  we  shall  choose  the  (decimal)  under  i  so  that  c,  is 
associated  with  the  corresponding  binary  vector.  Any  vector  A'* 
has  only  two  possible  successor  states.  The  transition  matrix  .4(2) 
for  Xi 2 *'2\  therefore,  is  sparse  with  elements 

0.5  i  =  2 j  mod 

0.5  i  =  2j  +  1  mod  N<2}  (2) 

0  else. 


p< 

i  -  1,  j 

=  2 

<7: 

i  =  2,  j 

=  1 

A, 

t  =  2, . . 

,,7V(1>  -  1,  j  =  i  +  1 

(5) 

/ri 

cnT 

II 

•*-» 

+ 

II 
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with  diagonal  elements  oJJ'  chosen  so  that  each  column  of  A(  1  * 
sums  to  unity.  The  received  signal  is  given  by 

yk  =  {X[l) ,a)  M(GXk  mod  2)  +  a  nk  (6) 

where  the  {«*}  is  a  sequence  of  independent  normal  A'(0. 1)  ran¬ 
dom  variables,  and  rr2  is  the  noise  power.  When  A A  is  in  the  state 
corresponding  to  the  vector  e;2),  write  d  =  (di, . . .  ,dNm)r , 
with 

d,  =  M(GXk  mod  2)  (7) 

so  that  M(GXk  mod  2)  =  d,  =  (x[2\d).  Our  observation 
process  can  be  thus  written  in  terms  of  the  canonical  state  variables 
as 

Vh  =  (Af  \a>  (Xf  \d)  +  onk.  (8) 

We  assume  all  parameters  a,  d,  cr2,  p,  q,  A  and  //  are  known.  Adap¬ 
tive  estimation  is  addressed  in  [6]. 

2.1.  Optimal  Demodulation 


The  encoder  operates  at  rate  1/P,  P  >  2,  1  with  generator  matrix 
G  :  {0, 1}M  ->  {0, 1}P.  Suppose  M  :  {0, 1}P  {qu  . . .  ,q-2p} 
where  q,  is  real,  denotes  the  modulation  operation.  Its  task  is  to 
map  the  2P  possible  values  of  the  encoder  output  onto  2P  real 
symbol  values  which  may  be  transmitted.  With  minor  modifica¬ 
tions  we  can  also  handle  complex  modulation  types  such  as  Quadra¬ 
ture  Amplitude  Modulation  (QAM).  The  transmitted  signal  is  then 

xk  =  M{GXk  mod  2).  (3) 


Given  the  observations  yk  =  {yo,y\  we  wish  to  obtain 

recursive  estimates  for  Xj1  ’  and  \  perhaps  with  some  delay 
A  >  0.  If  one  was  interested  in  minimum  variance  or  maximum 
a  posteriori  probability  (MAP)  estimation  of  both  the  underlying 
Markov  chain  states,  one  would  proceed  to  determine  a  recursive 
update  for  the  joint  a  posteriori  probabilities 


Qt- 


(i,j)  =  Pr  {X{"A  =  e<l),  Af_>A  =  cf  |3>*}  , 


(9) 


The  transmitted  signal  is  propagated  through  a  flat  fading  chan¬ 
nel,  which  acts  on  the  channel  as  a  multiplicative  gain  [1],  The 
fading  process  is  here  modelled  as  a  finite  state  Markov  chain  tak¬ 
ing  values  in  the  set  {ai ,  a^,. . aA,(i> }  where  0  =  ai  <  a-2  < 
■■■  <  aN (i).  2  We  provide  some  justification  for  the  choice  of 

1  More  general  rates  can  also  be  dealt  with  using  a  multiple  input  version 

°f(l). 

2The  zero  amplitude  state  is  included  to  permit  detection  of  the  pres¬ 

ence  of  the  signal  or  otherwise,  if  desired. 


and  then  compute  the  associated  conditional  expectations  or  MAP 
estimates..  In  the  usual  Viterbi  algorithm  (dynamic  programming), 
computation  of  (9)  is  replaced  by  a  sequential  maximisation  over 
all  possible  sample  paths  of  Xj 1 1  and  Arj21  for  t  =  0, . . .  ,  k  —  A. 
The  new  mixed  estimation  procedure  proposed  in  this  paper  con¬ 
sists  of  using  the  a  posteriori  probability  estimates  for  the  A'(  1 1 
process,  coupled  with  a  Viterbi  maximum  likelihood  sequence  es¬ 
timation  criterion  for  X(2) .  Formally,  this  means  considering  quan¬ 
tities  of  the  form 
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3.  SIMULATIONS 


qk(i,j)  =  max  Pr  {a^2),  . . .  , 

A0 

Af)  =  ef),Af)=ef)|^}.(10) 

Candidate  optimal  sequences  for  are  obtained  in  the  usual 
MLSE  manner,  except  that  the  each  time  in  the  backtracking  phase, 
MAP  estimates  are  obtained  for  xj,11  by  maximisation  (over  i)  of 
(10).  We  have  the  following  recursion  [7] : 


Aft1’ 


fj,  (Vk  +  l~aidA 

qk+i(i,j)  =  max  off  a{^  qk{n,t)  \  *  ■ 

l<f<Af(2)  J  °r<P(j/A;+l) 


(ID 


where  (j>(x)  =  -J=  e  *T  ,2 .  Initialisation  at  k  =  0  is  given  by 


=  (12) 

where  tt(1)  and  tv<2)  are  the  initial  probability  distributions  for 
X(1)  and  X,2>  respectively.  At  each  time  point,  we  keep  track 
of  the  maximising  index  in  (1 1),  ie  let 


AT*1' 

’MUH  argmax  afe]  Y]  a$  qk{n,  i).  (13) 

1  <  £  <  N(2)  n=l 

We  also  keep  track  of  the  maximising  value  of  A'n  ’  for  each  value 
ofX(2), 

mU)=  argmax  qk(i,j).  (14) 

1 < i < Nw 

Estimates  X'1^^  and  Xf^  (of  and  X^A  respec¬ 

tively)  are  produced  by  backtracking  by  a  fixed  number  A  samples 
at  each  time  k  >  A  : 


( i*,j *) 

For  s  —  k  —  1, . . .  ,k  —  A 
j*  :=  Vs+1(i*,j*), 


Then  : 


•(2)  _ 
k- A  c j *  i 


:=  argma xitjqk(i,j) 


i*  :=  Tjs  (j*) 


X 


(i) 

fc-A 


Al) 


(15) 


The  backtracking  delay  is  necessary  to  enable  proper  construction 
of  the  maximum  likelihood  sequence.  This  delay  is  chosen  suf¬ 
ficiently  large  that  all  candidate  optimal  sequences  backtracking 
from  time  k  have  merged  at  time  k  —  A.  Thus  in  order  to  apply  the 
algorithm,  the  quantities  qk  {i,  j)  are  initialised  at  time  k  =  0  ac¬ 
cording  to  (12),  and  updated  for  each  time  k  >  0  via  (1 1).  At  each 
time  we  also  retain  maximising  indices  via  (13)  and  (14).  Back¬ 
tracking  also  takes  place  at  each  time  k  >  A  according  to  (15)  to 
extract  the  desired  estimates. 


2.2.  Reduced  Complexity  Filters 

The  reader  is  referred  to  [7]  for  details  of  the  various  suboptimal 
filters  used  here. 


In  this  section  we  present  results  of  simulation  experiments  used 
to  compare  6  demodulators  applied  to  the  fading  convolutionally 
coded  signal  described  above.  The  performance  of  the  optimal 
scheme,  the  Kalman  and  HMP  PSP  techniques,  the  Kalman  and 
HMP  predictor  based  methods,  and  usual  MLSE  with  the  ampli¬ 
tude  process  known  to  the  receiver  were  compared.  The  Kalman 
and  HMM  predictor  methods  used  decision  delay  A  =  1,  which 
we  argue  later  is  the  best  value  to  choose,  at  least  in  the  Kalman 
case.  In  our  experiments,  we  did  not  observe  any  statistically  sig¬ 
nificant  difference  between  the  performance  of  the  Kalman  filter 
based  methods  and  the  corresponding  HMP  based  methods,  ie  the 
Kalman  predictor  method  performed  similarly  to  the  HMP  predic¬ 
tor  method,  and  similarly  for  the  PSP  techniques. 

The  resulting  Bit  Error  Rates  (BER)  are  shown  in  figure  1.  Fig¬ 
ure  2  repeats  for  a  more  rapidly  varying  amplitude  case.  It  is  seen 
that  in  both  cases,  the  predictor  based  methods  perform  the  worst, 
with  the  PSP  methods  yielding  performance  in  between  that  of  the 
predictor  methods  and  the  optimal  method.  The  optimal  technique 
performs  quite  close  to  the  case  where  the  receiver  knows  the  fad¬ 
ing  process  exactly.  The  performance  gain  in  using  the  optimal 
filter  appears  to  increase  for  higher  SNRs. 


Fig.  1.  Bit  Error  Rate  Performance  for  Filter  (p  =  \  =  0.05) 

We  also  examined  the  error  behaviour  of  the  Kalman  predictor 
method  as  a  function  of  the  parameter  A.  Recall  that  A  >  1 
denotes  the  time  lag  (in  samples)  until  we  make  a  decision  about 
the  encoder  state.  This  value  is  used  to  predict  the  amplitude  pro¬ 
cess  (gain)  value  forward  from  the  Kalman  filter  to  the  Viterbi 
decoder.  Figure  3  shows  rather  interesting  behaviour  in  that  the 
smallest  possible  A  =  1  resulted  in  the  best  overall  BER  perfor¬ 
mance.  Here  p  =  X  =  0.1,  and  the  SNR  was  29  dB.  Clearly, 
larger  smoothing  lags,  which  one  would  normally  expect  to  result 
in  better  state  estimates  (for  the  encoder  process)  [5]  are  not  result¬ 
ing  in  better  performance  of  the  overall  scheme.  We  may  conclude 
that  the  behaviour  evident  in  figure  3  is  due  to  the  poor  predic¬ 
tion  performance  of  the  Kalman  method.  This  is  to  be  expected 
since  it  is  not  generally  possible  to  accurately  predict  a  discrete 
state  HMM.  We  conclude  that  some  sort  of  joint  estimation  pro¬ 
cedure  (either  explicit  as  in  our  optimal  approach,  or  implicit  as 
in  PSP)  is  really  necessary  to  obtain  reasonable  performance  with 
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Fig.  2.  Bit  Error  Rate  Performance  for  Filter  (p  =  A  =  0.2) 


the  model  we  have  assumed  for  the  fading  channel  amplitude  pro¬ 
cess.  Computational  complexity  for  the  decision  directed  methods 
is  low  since  only  one  amplitude  tracking  filter  is  required.  For 
PSP  and  optimal  processing,  the  complexity  is  greater  by  approx¬ 
imately  the  number  of  convolutional  encoder  states  (ie  2W)  since 
that  number  of  tracking  filters  are  required. 


Fig.  3.  Effect  of  parameter  A  on  the  Kalman  predictor  method 

4.  CONCLUSION 

In  this  paper  we  have  derived  the  optimal  filter  for  a  hidden  Markov 
process  consisting  of  the  product  of  two  statistically  independent 
underlying  Markov  chains  observed  in  additive  white  Gaussian 
noise,  which  may  have  state  dependent  moments.  We  apply  a 
mixed  estimation  criterion  in  order  to  formulate  the  filter.  We  seek 
the  Maximum  Likelihood  Sequence  corresponding  to  one  of  the  un¬ 
derlying  chains,  and  a  posteriori  probabilities  (APPs)  for  the  other 
underlying  chain.  This  mixed  criterion  is  motivated  by  a  particular 
application,  namely  the  demodulation  of  a  rapidly  fading  convo- 
lutionally  coded  communications  signal.  The  signal  is  decoded 
using  maximum  likelihood  sequence  estimation  (MLSE).  Estima¬ 
tion  of  the  fading  process  is  performed  according  to  the  maximum 
a  posteriori  probability  criterion,  requiring  computation  of  APPs. 
The  performance  of  the  optimal  filter  for  this  example  is  com¬ 


pared  to  a  more  conventional  approach  consisting  of  decoupled 
estimators  for  each  underlying  chain.  These  estimators  are  stan¬ 
dard  MLSE  implemented  via  the  Viterbi  algorithm  for  the  con- 
volutionally  coded  part,  and  a  decision-directed  predictor  for  the 
gain  process.  The  case  where  the  gain  process  is  known  to  the  re¬ 
ceiver  is  used  as  a  benchmark.  We  also  compare  performance  with 
a  per-survivour  processing  (PSP)  technique  which  has  computa¬ 
tional  complexity  less  than  the  optimal  method,  but  greater  than 
the  simple  prediction  technique.  In  both  the  prediction  and  PSP 
methods,  we  examined  both  Kalman  and  hidden  Markov  process 
based  approaches,  and  found  no  significant  difference  in  perfor¬ 
mance  between  them  in  each  case.  The  PSP  approach  has  been 
addressed  in  [9],  which  also  considers  frequency  selective  fading. 
Simulations  show  that  the  predictor  methods  performs  worst  but 
the  optimal  filter  illustrates  minimal  performance  degradation  as 
compared  to  the  known  amplitude  case.  The  PSP  technique  offers 
performance  between  that  of  the  simple  prediction  method,  and  the 
optimal  method.  In  this  paper,  we  have  not  addressed  the  issue  of 
estimating  the  fading  process  model  parameters.  This  problem  is 
being  addressed  in  current  work  [6].  We  have  also  not  addressed 
frequency  selective  fading  here,  but  indicate  that  the  same  idea  as 
presented  here  could  be  applied  to  such  cases,  albeit  with  a  sub¬ 
stantial  increase  in  computational  requirements. 
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ABSTRACT 

This  work  is  concerned  with  extension  techniques  of  finite 
signals  for  subband  processing  using  tree-structured  filter 
banks.  In  many  applications  it  is  desirable  that  the  selected 
extension  defines  an  orthogonal  transform.  Although  it  is 
clear  that  periodization  solves  this  problem,  some  compli¬ 
cations  arise  when  using  this  technique:  spurious  high  fre¬ 
quencies  or  artificial  discontinuities  appear  in  the  transform 
vector.  Considering  AR  processes  as  input  signals,  the  so¬ 
lution  of  this  problem  is  an  algorithm  for  the  generation  of 
alternative  orthogonal  signal  extensions  which  do  not  intro¬ 
duce  artificial  discontinuities  in  the  subband  signals.  Ex¬ 
perimental  results  that  illustrate  the  effectiveness  of  the  pro¬ 
posed  design  method  are  discussed  briefly. 

1.  INTRODUCTION  AND  NOTATION 

The  use  of  tree-structured  paraunitary  filter  banks  for  pro¬ 
cessing  finite  length  signals  needs  of  specific  techniques  for 
handling  the  boundaries  in  order  to  ensure  perfect  recon¬ 
struction  and  orthogonality  [1,  2,  5,  6,  7],  In  the  previous 
literature  two  approaches  have  been  proposed  to  overcome 
this  problem:  signal  extension  methods  [6]  and  boundary 
filters  [5].  Among  the  traditional  signal  extensions  the  only 
one  that  preserves  the  orthogonality  of  the  transformation 
is  the  periodization,  but  it  is  known  to  introduce  artificial 
discontinuities  in  the  transform  domain,  a  very  annoying 
effect  in  many  applications.  On  the  other  hand,  although 
the  works  on  boundary  filters  provide  alternative  solutions 
that  also  lead  to  orthogonal  transformations,  the  problem  of 
artificial  discontinuities  is  not  solved  either. 

In  our  recent  work  [3, 4]  we  have  proposed  an  algorithm 
for  the  design  of  orthogonal  extensions  different  to  the  clas¬ 
sical  periodization,  but  the  new  solutions  does  not  neces¬ 
sarily  provide  transformations  which  do  not  present  these 
spurious  high  frequencies.  Thus,  this  paper  is  a  continu¬ 
ation  of  our  previous  work;  considering  AR  processes  as 
input  signals,  it  provides  an  efficient  scheme  for  the  design 
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of  orthogonal  extensions  which  do  not  introduce  artificial 
discontinuities  in  the  transform  domain. 

Before  starting,  we  summarize  the  notation  and  recall  a 
few  results  from  our  previous  papers  which  are  necessary  to 
follow  the  development  of  the  new  algorithm.  We  will  con¬ 
sider  only  real  valued  signals  and  filters.  Boldface  lower¬ 
case  letters  will  denote  vectors  and  boldface  uppercase  ones 
will  denote  matrices.  We  use  H mxn  to  represent  an  m  rows 
n  columns  matrix;  the  iVth-order  null  and  identity  matrices 
are  respectively  denoted  by  Ojv  and  Ijv. 

We  will  consider  the  paraunitary  filter  bank  given  by  the 
low  pass  filter  h  =  [h(0),  h(l),  ■  ■  ■ ,  h(L  - 1)]  andtheasso- 
ciated  high  pass  filter  g  =  [h(L— 1),  —h(L— 2),  •  -  -h(0)], 
assuming  that  L  =  2K  +  2,  with  K  even.  From  these  fil¬ 
ters  we  can  construct  the  matrix  Hmx(m.h2R>  As  shown  in 
the  previous  literature  [5],  H mx(m+2K)  can  be  written  as  a 
block  Toeplitz  form: 


A  K 

A0 

02 

...  02 

02 

ak  ... 

Ao 

02 

02 

...  02 

A* 

...  A0 

where,  for  all  j  = 

A,  : 

=  r 

h(2j  +  1) 

h(2j) 

3  [-h(L-2j-2)  h(L  —  2j  —  l)  J  ' 

In  the  other  way,  H kx3K  can  be  split  into  three  block- 
Toeplitz  submatrices  of  order  K:  H  kx3K  =  [DEF].D 
and  F  are,  respectively,  upper  and  lower  block  triangular 
matrices  [3].  Moreover,  we  can  write  D  =  QiKdPi,  F  = 
QoKFP0  and 

E  =  QxKdCPq  -  Q„KfCtPi, 

where  [Qo  Qi]  and  [P0TPiT]  are  orthogonal,  and  KF, 
Kd  and  C  are  square  matrices  of  order  K/2. 

Let  us  consider  a  finite  signal  x  of  even  length  N  >  2K : 
x=  [x(0),x(l),  •  •  •  ,x(AT  —  1)]T  =  [x?  x^T  xf]T,  where 
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xa  and  Xf,  contain,  respectively,  the  first  and  last  K  compo¬ 
nents  of  x,  and  xc  the  remaining  central  ones.  We  define  an 
extension  of  x  as  the  vector  xe  =  [x^,xT,xJT]T.  We  will 
study  linear  extensions  of  the  type  x;  =  C,  axn  +  Cl,bxh 
and xr  =  Cr,axa+Cr’bX(,,  where  Cl,n,Cl'b  and  Cr'a,Cr'h 
are,  respectively,  the  left  and  right  extension  matrices. 

Throughout  this  paper  we  want  to  study  the  transforma¬ 
tion  of  the  extended  vector,  i.e.,  ye  =  H/vX(/v+2ft')X,,.  This 
amounts  to  processing  the  signal  xe  by  means  of  the  analy¬ 
sis  filter  bank  given  by  h  and  g,  only  retaining  the  N  central 
output  samples.  The  whole  transformation  of  the  original 
signal  x  can  be  expressed  as  ye  =  Gx.  It  has  been  proven 
[3]  that  the  transformation  is  orthogonal  if  and  only  if 


G  = 


DC,,a+E  F0Kx(n_3X)  DC'’fc 

H(JV-2K)xJV 

FCr’a  0kx(jv-3*)D  E  +  FCr’\ 


and  there  exists  a  unitary  matrix  V 
der  K  such  that 


Vi  V2 
Vs  V4 


of  or- 


DCin  =  Qi(V1-KdC)P0 

DC"'  =  QiV2Pi 

FCr,a  =  QoV3Po 

FC"'  =  Qo(V4+KfCt)Pi. 


2.  DESIGN  OF  ADAPTIVE  ORTHOGONAL 
TRANSFORMS  WITHOUT  ARTIFICIAL 
DISCONTINUITIES 

Our  aim  is  to  construct  an  orthogonal  extension  which  does 
not  introduce  artificial  discontinuities  in  the  transform  do¬ 
main.  We  consider  that  the  input  signal  corresponds  to  an 
AR  process,  so  that  we  can  assume  that  when  extending 
the  original  signal  by  using  linear  prediction  techniques,  the 
subband  vector  does  not  present  spurious  high  frequencies. 
Unfortunately,  the  extension  obtained  by  means  of  linear 
prediction  is  not  orthogonal.  Therefore,  our  design  problem 
can  be  formulated  as  the  search  for  the  orthogonal  extension 
that  leads  to  a  transform  vector  as  similar  as  possible  to  the 
transform  resulting  from  an  extension  by  linear  prediction. 


matrix  associated  to  r.  Analogously,  we  can  obtain  the  vec¬ 
tor  1  and  extend  the  left  border  with  the  vector  Cixn  1 .  In 
this  way, 

xpr  =  [(C  1xa)T,x^x?,xJ,x’:,x^(Crxfc)T]T- 

On  the  other  hand,  let  us  consider  any  orthogonal  exten¬ 
sion  xe  from  x, 

r  T  T  T  T  T  T  TiT 

xc  =  [x,  ,xa>xQ,,xc,,x6,,xt,xr]  . 

Now,  we  impose  the  respective  transform  vectors,  Hxpr 
and  HxP  =  Gx  =  ye ,  to  be  as  close  as  possible;  this  pro¬ 
blem  can  be  formulated  as  the  minimization  of  the  euclidean 
norm 

||Hxpr  ye||  ■ 

Both  vectors  are  equal  except  from  the  first  and  last  K  sam¬ 
ples;  thus,  the  first  K  coefficients  of  the  error  vector  are 

(DC,,a  -  DCi)xa  +  DC"'xj, 

whereas  the  last  K  coefficients  can  be  written  as 

(FC"'  -  FCr)x(,  +  FCr,nxn 

Now  we  must  find  the  extension  matrices  Ci,n^  C"',  Cr,n, 
C"’  which  minimize  the  norm  ||Hxpr  —  yE||  .  By  using 
the  identities  ( 1 )  and  the  fact  that  Qo  and  Qi  have  orthonor¬ 
mal  columns,  we  get  the  following  expression  for  the  norm 
to  be  minimized: 


||((V1-KdC)P0  -  KDP1Cl)xa  +  V2Pix(,||2  + 

+  ||  ((V4  +  KfCt)Pi  -  KFP0Cr)x(,  +  V3P0xa  ||2  , 


which  can  also  be  written  in  a  matrix-vector  form: 


Vi  V2  ■ 

Poxa 

'  KD(P1C1  +  CPo)xa 

[  Vs  V4 

PlXf, 

KF(P0Cr-CrPi)x;, 

Taking  into  account  that  the  first  matrix,  V,  is  unitary,  we 
can  finally  formulate  the  minimization  problem  as: 


min  ||Va— b|| 

v  unitary 


2.1.  Design  of  the  transformation  matrix 

Let  x  =  [x^,  x’a, ,  xj ,  Xjjj ,  Xj f  ]T  be  a  signal  of  length  N,  be¬ 
ing  x„ ,  x„/ ,  X(,- ,  Xf,  of  length  K  =  Lj  2  -  1 .  Let  us  consider 
the  signal  xpr  which  comes  from  extending  x  at  each  bor¬ 
der  by  means  of  a  ATth  order  linear  predictor.  In  this  way,  if 
x  is  an  AR  process  of  order  K,  so  is  xpr.  In  other  words,  if 
r  is  the  vector  containing  the  K  autoregression  coefficients, 
we  construct  xpr  by  appending  K  samples  CrXf,  at  the  right 
border  of  x.  Cr  is  the  A'-th  order  power  of  the  companion 


being 


PoxQ 

U  — 

■  KdCPxC!  +  CPo)xn 

PlXf, 

y  D  — 

KF(P0Cr-CrP1)xk 

The  Cauchy-Schwartz  inequality  and  the  fact  that  V  is 
unitary,  assure  that 

l|Va  -  b||2  >  ||a||2  +  ||b||2  -  2  ||a||  ||b||  =  (||a||  -  IN)2  ■ 

1  Note  that  there  exists  a  clear  relationship  between  1  and  r. 
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The  minimum  error  |||a||  —  ||b|||  is  reached  if  and  only 
if  Va  and  b  are  proportional,  so  it  suffices  to  take 


V  unitary  such  that  Va  =  ~jb.  (2) 

I|d|| 


On  the  other  hand, 


ef 


a  4-  m 


<  a/H; 


therefore,  we  can  write  the  following  bound  for  the  error: 


In  order  to  build  a  K th  order  unitary  matrix  V,  we  take 
first  the  normalized  vectors  ai  =  a/  ||aj|  andbi  —  b/  ||b|| ; 
secondly,  by  using  a  Gram-Schmidt  procedure,  we  build 
any  unitary  basis  of  RK  whose  first  vector  is  ai ,  and  simi¬ 
larly,  any  unitary  basis  beginning  with  bj .  The  matrix  which 
transforms  the  first  basis  into  the  second  one  is  V.  If  we 
want  V  to  be  a  product  of  rotations,  as  in  Givens  parame¬ 
terization,  the  number  of  degrees  of  freedom  for  construc¬ 
ting  V  is  (-R'~1H*'~2)  (although  V  has  size  K,  the  condi¬ 
tion  Vax  =  bx  lead  us  to  the  problem  of  building  a  unitary 
matrix  of  size  K  —  1,  so  we  need,  \  (K  —  1)  (K  —  2)  para¬ 
meters). 

Remark:  Although  there  exist  infinite  unitary  matrices 
V  which  satisfy  (2)  and,  hence,  for  which  the  minimum 
error  is  reached,  we  must  remark  that  all  of  them  lead  to 
a  unique  transform  vector  ye=  Gx.  Thus,  the  proposed 
method  provides  a  unique  subband  transform  vector  ye,  re¬ 
lated  to  the  original  signal  x.  Nevertheless,  the  lack  of  uni¬ 
city  of  V  implies  that  the  expression  of  the  new  adaptive 
orthogonal  matrix  G  is  not  unique.  Its  first  and  last  K  rows 
may  vary.  Moreover,  if  we  regard  that  these  rows  contain 
the  border  filters  associated  to  the  orthogonal  transform,  we 
conclude  that  this  method  lead  to  the  construction  of  an  in¬ 
finite  number  of  orthogonal  boundary  filters  which  do  not 
introduce  artificial  discontinuities.  In  other  words,  the  pro¬ 
posed  solution  can  be  considered  as  the  first  design  method 
for  this  kind  of  adaptive  orthogonal  boundary  filters. 


2.2.  Error  estimation 


ef  <  max 


P0xo 

Cixa  1 

PlX,  _ 

CrX(,  J 

We  can  observe  that  this  bound  depends  only  on  the  be¬ 
havior  of  the  original  signal  near  its  edges:  if  the  absolute 
values  of  the  original  samples  are  small  at  the  borders,  so 
will  the  error  bound. 


23.  Generation  of  the  extended  signal 

We  are  interested  in  the  design  of  the  associated  extended 
vector  in  the  time  domain  xe  =  [x^,  xT,  x^]T.  Let  a  = 
||a||  /||b||,  and  let  V  be  any  unitary  matrix  such  that  Va  = 
ab.  The  submatrices  of  V  satisfy 

f  ViP0xQ  +  V2PlX()  =  aKotPiCj  +  CP0)xQ, 

\  V3P0xa  +  V4PrXfc  =  aKF(P0Cr  -  CTP1)xfc. 

If  we  left  multiply  these  identities  by  Qi  and  Qo,  respec¬ 
tively,  and  apply  (1),  we  get 

f  D x/  =  aDCixQ  +  (a  -  1)  QiKdCP0xq, 

\  Fxr  =  aFCrX*  -  (a  -  1)  QoKpC^ix,,. 

From  these  identities  we  derive  that  Dxj,  Fxr  (and,  there¬ 
fore,  the  whole  transform  vector  ye),  no  longer  depend  on 
the  expression  of  the  matrix  V,  whenever  V  verifies  (2). 
But,  in  the  time  domain,  we  obtain  that  X( ,  xr  are  not  com¬ 
pletely  determined:  there  exist  arbitrary  vectors  mi,m2 
such  that 

f  x(  =  aCixa  +  (a  -  1)  Pf  CP0xo  +  P0Tmi 
1  xr  =  aCrXf,  -  (a  -  1)  PjfCTPiXj,  +  PiTm2 


The  expression  for  the  minimum  error  in  the  frequency  do¬ 
main  is  e/  =  |||a||  —  ||b||| ,  which  depends  only  on  the  sig¬ 
nal  and  the  prototype  filter.  The  method  proposed  in  the  pre¬ 
vious  section  provides  the  orthogonal  extension  that  mini¬ 
mizes  this  error.  On  one  hand,  we  know  that 

!|b||2  =  IIKntPiC,  +  CP0)xJ2 
+  ||KK(P0Cr-CTP1)x6||2, 

INI2  =  UPoxJ2  +  ||P xX(,||2  ; 

and  defining  m  =  ||b||2  -  j|a||2,  it can  t>e  shown,  by  using 
the  relations  between  Kd,  Kf  and  C  [3],  that 

-  (lIPoxJ2  +  ||Pixb||2)  <  m  <  ||CixJ2  +  ||Crx;,||2  . 


so  the  extended  vector  xe  cannot  be  defined  in  a  unique  way. 
However,  there  is  only  one  choice  that  best  approximates 
xpr  \  in  effect, 

||xe  Xpr ||  —  ||x(  Cix0||  -f-  ||xr  CrX(,|| 

and  it  can  be  easily  shown  that  both  norms  are  minimized  by 
taking  mi  =  (1  -  a)P0Cixa  and  m2  =  (1  -  a)PiCrxj,. 
For  this  choice,  we  obtain 

f  x;  =  Cixa  +  (a  -  1)  PiT(CP0  +  PiCOxa 
1  Xr  =  Crx„  -  (a  -  1)  PoT(CTP!  -  P0cr)x> 

Again,  the  error  in  the  time  domain  et  =  ||xe  —  xpr||  is 
proportional  to  a  —  1  =  ,  and  this  quantity  only 

depends  on  the  signal  and  the  prototype  filters,  not  on  the 
orthogonal  extension. 
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3.  EXPERIMENTAL  RESULTS 

We  have  applied  the  proposed  method  to  a  great  variety 
of  finite  signals,  considering  Daubechies  filters  of  different 
lengths  as  prototype  filters.  Some  results  that  illustrate  the 
performance  of  our  method  are  shown  in  the  figures  pre¬ 
sented  here.  The  first  test  signal  is  the  cubic  spline  presented 
in  Figure  1(a),  which  corresponds  to  an  AR-4  process.  The 
output  of  the  two  channel  cell  obtained  using  the  orthogonal 
extension  proposed  in  this  paper  and  length  10  Daubechies 
filters  is  displayed  in  Figure  1(b),  while  in  Figure  1(c)  we 
can  observe  the  transform  vector  when  using  a  periodic  ex¬ 
tension.  It  can  be  clearly  observed  that  no  artificial  disconti¬ 
nuities  appear  when  using  our  extension  algorithm.  A  more 
realistic  signal,  that  corresponds  to  an  audio  frame,  is  shown 
in  Figure  2(a).  The  transform  vectors  using  our  orthogonal 
extension  algorithm  and  periodization  can  be  observed  in 
Figures  2(b)  and  2(c)  respectively.  In  this  case  the  transfor¬ 
mations  have  been  performed  using  length  26  Daubechies 
filters.  Again,  the  performance  of  our  method  overcomes 
the  periodic  extension  technique. 


(c) 


Fig.  1.  (a)  Cubic  signal;  (b)  transform  vector  using  our  or¬ 
thogonal  extension  method;  (c)  transform  vector  using  peri¬ 
odic  extension. 

4.  CONCLUSIONS 

In  this  paper  we  have  developed  a  technique  for  processing 
finite  length  signals  with  paraunitary  filter  banks  without  in¬ 
troducing  artificial  discontinuities  in  the  subband  signals. 
The  two  issues  that  have  been  considered  are  the  design  of 
the  optimal  transformation  matrix  and  the  generation  of  the 
corresponding  extended  signal.  The  design  procedure  is  for¬ 
mulated  as  an  optimization  problem  that  can  be  analytically 
solved,  so  that  the  theory  can  be  clearly  developed.  The  ab¬ 
sence  of  artificial  discontinuities  in  the  transform  domain  is 
clear  from  our  tests,  providing  a  great  improvement  in  re¬ 
lation  to  existing  orthogonal  signal  extension  methods  and 
boundary  filters. 


<C) 


Fig.  2.  (a)  Original  audio  frame;  (b)  transform  vector  using 

the  proposed  orthogonal  extension  method;  (c)  transform 

vector  using  periodization. 
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ABSTRACT 

This  paper  studies  the  design  of  M-channel  perfect- 
reconstruction  (PR)  linear-phase  (LP)  filter  banks  (EBs)  with 
M  =  2K  using  a  tree-structured  FB.  It  is  based  on  a  observation 
of  Fliege  [1],  the  length  of  the  analysis  filters  is  decreased  by  a 
factor  of  two  when  the  depth  of  the  tree  is  increased  by  one.  while 
its  transition  bandwidth  is  increased  by  the  same  factor.  A 
lattice-based  2-channel  LP  FB  is  chosen  because  the  frequency 
responses  of  the  lowpass  and  highpass  analysis  (synthesis)  filters 
can  be  designed  to  be  closely  symmetric  to  the  other  around 
7i  1 2  .  By  properly  selecting  the  filter  length,  transition 
bandwidth,  and  stopband  attenuation  of  the  2-channel  PR  LP  FBs 
at  each  level  of  the  tree  structure,  it  is  possible  to  design  uniform 
PR  LP  FB  with  excellent  frequency  characteristic  and  much 
lower  system  delay. 

I.  INTRODUCTION 

Perfect-reconstruction  (PR)  linear-phase  (LP)  filter  banks 
(FBs)  are  used  in  a  wide  range  of  applications,  such  as  data 
compression,  communications,  and  image  and  speech  coding. 
Fig.  1  shows  the  block  diagram  of  a  critically  decimated  M- band 
uniform  FB.  The  input  signal  is  first  decomposed  by  M  analysis 
filters  H,  (z) .  The  outputs  are  then  decimated  by  a  factor  of  M  to 
form  M  subband  signals.  In  the  synthesis  bank,  the  subband 
signals  are  upsampled  by  a  factor  of  M  before  passing  through  the 
synthesis  filters  F,(z)  to  reconstruct  the  processed  signal.  The 
theory  and  design  of  PR  FB  have  been  widely  studied  in  the 
literature  [2].  One  efficient  structure  is  the  cosine-modulated 
(CMFB),  where  the  analysis  filters  (synthesis  filters)  are  obtained 
by  cosine  modulation  of  a  prototype  filter.  Due  to  the  cosine 
modulation,  the  implementation  •  and  design  complexities  of 
CMFB  are  very  low  compared  with  a  general  PR  FBs. 
Unfortunately,  the  classical  CMFB  proposed  in  [3]  does  not  have 
LP  analysis  and  synthesis  filters,  which  is  desirable  in  some 
applications.  More  recently,  a  new  class  CMFB  using  a  different 
cosine  and  sine  modulations  are  proposed  [4].  Although  the 
analysis  and  synthesis  filters  are  LP,  its  frequency  support  is 
considerably  different  from  that  of  uniform  FB  and  there  is 
considerable  overlap  between  the  passband  of  the  low  frequency 
analysis  filters.  Another  popular  class  of  LP  M-channel  uniform 
FBs  is  the  linear-phase  paraunitaiy  filter  bank  (LPPUFB)  [5], 
where  the  LP  FB  is  parameterized  as  a  cascade  of  delays  and 
unitary  matrices,  which  can  further  be  parameterized  as  a  series 
of  planar  rotations.  The  design  of  LPPUFB  can  be  very  involved 
because  of  the  large  number  of  design  parameters  and  the  highly 
nonlinear  dependency  of  the  frequency  response  on  the  rotation 
parameters  [5-8],  This  usually  limits  the  stopband  attenuation  of 
the  FB.  Another  commonly  used  method  to  construct  PR  FB  is  to 
cascade  sets  of  PR  FBs  with  smaller  number  of  channels  in  a  tree 
structure  [9],  For  example,  an  8-channel  PR  FB  can  be  obtained 
by  cascading  sets  of  2-channel  PR  FBs  in  a  tree  structure  with  3 
levels  as  shown  in  Fig.  2.  The  output  from  the  previous  level  is 
further  decomposed  using  the  analysis  filters  in  that  level  into  two 
more  channels.  In  general,  all  the  2-channel  PR  FBs  can  differ 
from  each  other  and  they  can  be  either  linear-  or  nonlinear-phase. 
In  wavelet  transform  and  most  tree-structured  FBs  considered  in 
the  literature,  the  same  set  of  PR  FB  is  used  throughout  the  tree 


structure.  Two  significant  drawbacks  of  this  structure,  from  the 
viewpoint  of  designing  a  uniform  FB,  are  the  high  system  delay 
and  the  asymmetric  transition  band  of  the  analysis  filters.  The 
latter  usually  results  in  a  higher  filter  order  to  satisfy  a  given 
stopband  attenuation  and  transition  bandwidth,  which  further 
increases  the  total  system  delay.  This  is  illustrated  in  Fig.  3  using 
a  2-channel  PR  FB  with  filter  length  N  =  128.  It  can  be  seen  that 
transition  bandwidth  are  unequal  and  the  system  delay  rapidly 
increases  to  889  samples.  In  [1],  Fliege  has  shown  that  the 
system  delay  of  a  tree-structured  FB  can  be  drastically  reduced  by 
having  non-identical  analysis  filters  in  each  level  of  the  tree. 
More  precisely,  the  length  of  the  filters  should  decrease  by  a 
factor  of  two  when  going  from  one  level  to  the  other,  while  their 
transition  bandwidth  should  increase  by  the  same  factor.  In  this 
paper,  we  further  study  this  novel  idea  in  the  design  of  M-channel 
LP  uniform  FB  using  the  lattice-based  two-channel  LP  FB 
proposed  in  [10].  The  main  reason  in  choosing  the  latter  is  that 
the  frequency  responses  of  the  lowpass  and  highpass  analysis 
(synthesis)  filters  can  be  designed  to  be  closely  symmetric  to  the 
other  around  n!2  .  By  properly  selecting  the  filter  length, 
transition  bandwidth,  and  stopband  attenuation  of  the  2-channel 
PR  LP  FBs  at  each  level  of  the  tree  structure,  it  is  possible  to 
design  uniform  LP  FB  with  excellent  frequency  characteristic  and 
much  lower  system  delay.  For  example,  a  uniform  8-channel  PR 
LP  FB  with  the  same  worse-case  transition  bandwidth 
requirement  and  stopband  attenuation  as  the  previous  one 
(A-128)  can  be  achieved  with  the  new  structure  with  a  much 
lower  implementation  complexity  and  system  delay  of  377 
samples  (Fig.  4).  The  savings  also  increase  linearly  with  the  depth 
of  the  tree  structure.  The  resulting  FB,  therefore,  serves  as  useful 
alternative  to  the  LPPUFB  for  designing  LP  FB  with  N  a  powers 
of  two  number  and  more  generally  M  a  composite  number. 
Though  the  design  of  the  component  LPPUFBs  in  the  latter  case 
will  become  more  complicated  than  the  2-channel  LP  FB  in  the 
former,  it  is  still  much  simpler  than  designing  directly  an  M- 
channel  LPPUFB.  The  rest  of  the  paper  is  organized  as  follows: 
Section  II  is  devoted  to  the  proposed  tree-structured  PR  LP  FB. 
The  design  procedure  and  some  design  examples  are  given  in 
Section  HI.  Conclusions  are  drawn  in  Section  TV. 

H.  TREE-STRUCTURED  PR  LP  FBS 

First  of  all.  let’s  consider  an  8-channel  tree-structured 
uniform  FB  constructed  by  cascading  2-channel  PR  FBs  as  shown 
in  Fig.  2.  (z)  and  H'f  ’  (z)  are  respectively  the  lowpass  and 

highpass  analysis  filters  of  the  2-channel  PR  FB  at  the  k  -th  level 
of  the  tree  structure,  where  1  <  k  <  K,  and  K  is  the  total  number 
of  levels  in  the  tree.  In  the  synthesis  bank,  the  subband  signals 
are  recombined  successively,  two  at  a  time,  by  a  set  of  synthesis 
filters  F/f)(z)  .  From  the  noble  identity,  we  know  that  H(z) 
followed  by  a  decimator  with  a  ratio  of  two  is  equivalent  to 
H(z2)  preceding  the  decimator.  Therefore,  the  tree-structured 
FB  can  be  redrawn  as  an  8-channel  uniform  FB  shown  in  Fig.  1 
by  moving  the  analysis  filters  to  the  right  hand  side  of  the  tree 
structure,  leading  to  M  analysis  filters  Lf„,(z) ,  m=l,...M.  For 

convenience,  let’s  treat  the  index  “i”  in  Lf,l<)(z)  as  the  k  -th 
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digits  in  a  weighted  binary  representation  and  denote  it  by  bk  . 
The  equivalent  transfer  function  obtained  by  passing  the  signal 
though  the  branch  H^(z) ,  H^'(z) .....  H^\z)  can  then  be 

labeled  as  H„,(z)  .where  m  -  b.  +  2b.+---  +  2KybK  =(b, . bl: ). . 

The  resulting  analysis  filters  Hm(z)  and  synthesis  filters  Fnl(z) 
(0  <  m  <  M  - 1)  can  then  be  written  as 

HJz)  =  H™(z-)H£(z-  )...H^{z^ )...H^(ziK'')  (1) 

FJz)  =  (2) 

It  is  clear  that  the  whole  system  is  PR  if  the  2-channel  FBs  in 
each  level  are  PR.  Further,  if  //,<l)(z)  and  F'k'(z)  are  LP.  then 
so  are  the  filters  H„,  (z)  and  Fm  (z) . 

As  mentioned  earlier,  if  the  same  set  of  FB  is  employed  at 
all  levels  in  the  tree-structured  FB,  then  the  frequency  responses 
of  the  overall  analysis  filters  are  not  identical  due  to  the  up- 
sampling  of  z  in  moving  the  analysis  filters  to  left  of  the 
decimators.  Fig.  3.  As  a  result,  higher  implementation 
complexity  is  required  to  achieve  a  given  transition  bandwidth 
and  stopband  attenuation.  This  also  significantly  increases  the 
system  delay  of  the  FB.  For  example,  if  the  lattice-based  PR  LP 
FB  in  [10]  is  used  as  the  2-channel  FB.  the  system  delay  D  of  the 
3-level  tree-structured  FB  is  given  by 

D  =  N0)  +  2(Nr-)  +  2( N0)  - 1)  - 1)  - 1 .  (3) 

where  N(k)  is  the  length  of  FB  in  the  k  -th  level.  If  we  set 
iV(1)  =  N<:)  =  Nry  =  128  .  then  the  resulting  tree-structured  FB.  as 
shown  in  Fig.  3.  will  have  a  system  delay  of  889  samples.  It  can 
be  seen  that  2-channel  PR  FBs  at  different  levels  of  the  tree 
structure  contribute  differently  to  the  total  system  delay.  Each 
component  is  linearly  proportional  to  the  length  of  the  FB  used 
and  its  scalar  constant  grows  exponentially  with  the  depth  or  level 
of  the  FB  in  the  FB  tree.  To  reduce  the  total  system  delay,  it  is 
therefore  advantageous  to  reduce  the  length  of  the  FB  when  the 
level  increases. 

From  the  design  of  1-D  FIR  filter  using  the  Kaiser  window 
method,  we  know  that  the  length  of  the  filter  N.  stopband 
attenuation  A.  and  transition  bandwidth  Aco  are  related  by  the 


99  MPUs  and  291  APUs  to  59  MPUs  and  171  APUs.  This  novel 
technique  has  been  mentioned  in  [1]  but  unfortunately  the 
selection  of  the  2-channel  PR  FB  and  detail  design  examples  are 
missing.  In  this  work,  we  shall  show  that  this  approach  can  be 
used  to  obtain  M-channel  LP  PR  uniform  FB  (  M  -  2K  .  m  a 
positive  integer)  with  very  good  frequency  characteristic,  using 
the  lattice-based  2-channel  LP  PR  FB  proposed  in  [10],  Although 
the  efficient  structure  by  Phoog  et  al  [11]  can  also  be  used  in  a 
similar  manner,  which  is  very  attractive  because  of  their  low 
design  and  implementation  complexities  [12],  the  frequency 
responses  of  its  associated  lowpass  and  highpass  filters  are  not 
quite  symmetric  to  each  other.  Therefore,  the  frequency 
characteristic  at  the  transition  band  edges  will  start  to  degrade 
when  we  they  are  cascaded  to  form  a  tree  structure  with  large 
number  of  channels.  If  one  is  comfortable  with  nonlinear-phase 
FIR  filters  (i.e.  only  passband  linear-phase),  then  the  CQF  [9]  and 
the  general  low-delay  2-channel  FB  [13]  can  also  be  employed. 
The  latter  will  further  reduce  the  total  system  delay  of  the  FB. 
Interested  readers  are  referred  to  [14]  for  more  details  regarding 
their  design  and  factorization.  We  now  consider  the  design  of 
the  2-channel  lattice-based  LP  FB  and  some  design  examples. 

m.  DESIGN  PROCEDURE  AND  EXAMLES 
A.  DESIGN  PROCEDURE 

For  a  2-channel  critically  decimated  FB.  the  PR  condition  is 
given  by 

H^(-z)H\k\z)-H^(z)Hk(-z)  =  (7) 

where  d  is  a  positive  integer  and  p  is  a  nonzero  constant.  The 
synthesis  filters  are  given  by  F0<l,(z)  =  H\k\-z)  and 
Fi(k\z)  =  -H(0‘\-z).  For  our  LP  FB.  H[k\z)  and  H\k\z)  are 

chosen  respectively  to  be  symmetric  and  antisymmetric  having 
the  same  filter  length,  which  is  an  even  number.  Instead  of 
optimizing  the  lattice  coefficients,  which  involves  highly 
nonlinear  objective  function,  the  coefficients  of  filters  H{0k)(z) 
and  H\k\z)  are  obtained  by  solving  the  following  constrained 
optimization 

nun  =  <r|j  '  (l- 1  TOO  l):4<« +(1  -  cr)£j  ffJV")  fda 


following  formula 


N  = 


A- 8 
2.285 


(4) 


For  a  given  stopband  attenuation  and  passband  ripple,  the  filter 
length  is  inversely  proportional  to  the  transition  bandwidth.  From 
(4),  it  can  be  seen  that  the  worse  case  stopband  attenuation  of 
HJz)  is  equal  to  the  worse  case  stopband  attenuation  of  its 


+  (1  -  <t)£'"’“  '  I  //;>'")  fdco  +  (1  -  <T)£,„  (1- 1  Hi ‘ V")  I fdco 
subject  to  (7).  (8) 

where  (o{kk  (zr/2  <  a>,k)  <  7t)  and  n  -  (0{k)  are  the  stopband  cut 
off  frequencies  of  H(0l)(z)  and  H[k\z) ,  respectively,  u  is  a 
weighting  constant  from  0  to  1.  and  h  is  the  vector  containing  the 
free  variables  in  the  impulse  response.  The  transition  bandwidth 


factors  H^(z-).....H^(z-K  ')  and  its  transition  bandwidth  will  *s  A1  5  -2-co1/1  it .  It  is  assumed  that  the  FBs  at  stage  k  are  all 

,K.  ,k-i  identical.  The  constrained  optimization  is  solved  using  the 

depend  on  those  of  Hbi  (z  ) .....  and  (z  )  .  Due  to  the  DCONF  subroutine  in  the  1MSL  library.  On  average,  it  takes  150 


upsampling  by  a  factor  of  2 1-1  .  the  transition  bandwidth  of 
H{^ (z:‘  ')  will  be  2 1-1  times  narrower  than  that  of  HbPz) . 

Thus,  to  achieve  a  uniform  transition  bandwidth,  the  length  of  2- 
channel  PR  FB  should  be  reduced  by  a  factor  of  2  when  we  go 
from  one  level  to  the  other.  In  other  words. 

Nm  =2xN(2)  =4x  A(3,-..  =  2k'1  xNiK)  (5) 


iterations  for  convergence  and  the  violation  of  PR  constraints  is 
of  the  order  of  1(T15 . 

Given  the  number  of  channel  M  =2K  ,  transition 
bandwidth  A(1)  and  stopband  attenuation  A.  The  2-channel  PR 
LP  FB  at  the  first  level  is  first  designed  by  the  above  method  to 
satisfy  the  given  specification.  Suppose  that  a  filter  length  of 
Nm  is  required.  The  (K- 1)  2-channel  PR  LP  FBs  at  the  other 


A(,)=A(2)/2  =  A(3)/4  =  -  =  A<'0/2k‘1  .  (6) 

The  system  delay  D  (for  K=  3)  is  now  reduced  to  D  =  3Nm  -7. 
which  grows  only  linearly  with  the  depth  of  the  tree.  The 
arithmetic  complexity,  in  terms  of  the  numbers  of  multiplications 
and  additions  per  unit  time  (MPUs  and  APUs).  also  reduced  from 


levels  can  be  designed  with  parameters  given  in  (5)  and  (6). 

B.  DESIGN  EXAMPLES 

We  now  present  two  examples:  i)  a  two-level  tree- 
structured  FB  with  K  =  2  and  M= 4.  and  ii)  a  three-level  tree- 
structured  FB  with  K  =  3  and  M=  8.  For  simplicity.  A,(l) .  Ni2) . 
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and  N,y'  are  chosen  as  128,  64  and  32,  respectively.  The 
frequency  responses  of  the  three  two-channel  LP  FBs  are  shown 
in  Fig.3  (a),  (h)  and  (c).  It  can  be  seen  that  they  have 
approximately  the  same  stopband  attenuation  but  successively 
wider  transition  band.  The  frequency  responses  of  the  4-hand  and 
8-band  LP  analysis  FBs  obtained  by  cascading  these  2-channel 
FBs  in  a  tree  structure  are  shown  in  Fig.  6  and  Fig.  4, 
respectively.  It  can  be  seen  that  frequency  characteristic  of  this 
LP  8-channel  PR  FB  is  very  good  and  a  stopband  attenuation  over 
50  dB  can  be  readily  obtained.  The  design  parameters  of 
are  summarized  in  Table  1.  As  a  final  remark,  we  shall 
contrast  the  relative  merits  of  this  tree-structured  FB  and  the 
LPPUFB.  As  mentioned  earlier,  the  LPPUFB  usually  involves 
considerable  number  of  parameters,  especially  when  the  number 
of  channel  and  filter  length  is  large.  The  objective  function  is 
also  a  highly  nonlinear  function  of  the  planar  rotation  parameters. 
All  these  somewhat  limits  the  stopband  attenuation  of  the  FB  that 
can  he  designed.  The  proposed  tree-structured  FB  is  relatively 
easy  to  design  because  the  2-channel  LP  FBs  can  be  designed 
separately.  This  limits  the  number  of  parameters  in  each  sub 
problem  to  a  reasonable  value.  Moreover,  the  design  of  2- 
channel  LP  PR  FB  is  much  easier  than  designing  a  LPPUFB  and 
a  number  of  efficient  design  methods  are  already  available.  On 
the  other  hand,  the  major  disadvantage  of  the  tree-structured  FB 
is  the  restriction  that  the  number  of  channel  Mis  a  powers-of-two 
number.  Though  it  is  also  possible  to  form  tree-structured  FB  by 
cascading  FBs  with  2,  3  and  larger  number  of  channels  using  a 
similar  approach,  there  is  still  a  fundamental  limitation  on  the 
number  of  channel  that  can  he  designed. 

IV.  CONCLUSION 

A  method  for  designing  M-channel  LP  PR  FB  with 
M  =  2k  using  a  tree-structured  FB  is  presented.  It  is  based  on  a 
previous  observation  of  Fliege,  where  the  length  of  the  analysis 
filters  is  decreased  by  a  factor  of  two  when  the  depth  of  the  tree  is 
increased  by  one,  while  its  transition  bandwidth  is  increased  by 
the  same  factor.  A  lattice-based  2-channel  LP  FB  is  chosen 
because  the  frequency  responses  of  the  lowpass  and  highpass 
analysis  (synthesis)  filters  can  he  designed  to  be  closely 
symmetric  to  the  other  around  nil .  By  properly  selecting  the 
filter  length,  transition  bandwidth,  and  stopband  attenuation  of 
the  2-channel  PR  LP  FBs  at  each  stage  of  the  tree  structure,  it  is 


possible  to  design  uniform  PR  LP  FB  with  excellent  frequency 

characteristic  and  much  lower  system  delay. 
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Fig.  1.  The  block  diagram  of  a  critically  decimated  uniform  M-band  FB. 
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Log  Magnitude  (dB) 


//'*  ’  (z)  for  4-band  uniform  FB 

*  =  i 

H(02\z).  k  =  2 

Hlk)(z)  for  8-band  uniform  FB 

/C(z).  *=  1 

K2'(z).  k  =  2 

K3>(z).  k  =3 

Length  of  filter  Nik) 

128 

64 

32 

Stopband  cut  off  frequency  (o;k' 

0.275  x2 n 

0.3  x2  ji 

0.35  x2/r 

Transition  bandwidth  A(l) 

0.025  x2;r 

0.05  x2x 

0.1  x2  IT 

Parameters  to  be  optimized 

128 

64 

32 

Implementation  complexity 

33  MPUs 

97  APUs 

17  MPUs 

49  APUs 

9  MPUs 

25  APUs 

Table  1.  Parameters  of  the  lowpass  filters  in  tree-structured  FB. 


Normalized  Frequency 

Fig.  3.  The  frequency  responses  of  3-level  tree-structured  FB 
by  conventional  method. 
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Fig.  4.  The  frequency  responses  of  3-level  tree-structured  FB 
by  the  proposed  method. 


Fig.  5.  Frequency  responses  of  {  H^\z)  and  H\k\z) }  in  (a)  level  1.  (b)  level  2.  and  (c)  level  3. 
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ABSTRACT 

Recently,  there  is  an  increasing  interest  in  designing 
structurally  perfect  reconstruction  (PR)  filter  banks  because 
the  system  can  be  implemented  by  using  sum  of  powers-of- 
two  (SOPOT)  coefficients.  The  structurally  PR  filter  banks 
can  be  designed  by  factorization  based  on  lifting  scheme.  But 
there  exist  some  problems  that  will  be  addressed  in  this  paper. 
Improvement  of  the  factorization  to  solve  the  problems  is 
proposed.  The  procedures  of  proof  for  the  improvement  are 
given.  Final,  the  given  examples  show  that  the  proposed 
method  is  effective. 

L  INTRODUCTION 

Perfect  reconstruction  (PR)  multirate  filter  banks  (FB),  which 
is  used  for  division  of  a  signal  into  frequency  bands  and  the 
reconstruction  of  the  signal  from  the  individual  bands,  have 
important  applications  in  signal  analysis,  signal  coding  and 
the  design  of  wavelet  bases  [1—4].  Many  researchers  [5—1 1] 
proposed  a  number  of  constrained  optimization  techniques  for 
designing  linear-phase  and  low-delay  PR  two-channel  filter 
banks.  But  there  is  a  problem  that  the  filter  banks  so  obtained 
are  in  general  pseudo  PR.  To  overcome  the  problem, 
structurally  PR  filter  banks  are  desired.  An  efficient  lattice 
type  PR-QMF  bank  that  “structurally”  ensures  the  PR 
property  was  reported  by  Vaidynathan  and  Hoang  [6].  But  it 
is  difficult  to  design  with  a  general-purposed  nonlinear 
optimization  technique  because  of  some  reasons  such  as 
nonconvergence,  step-size  selection  and  local  suboptimality. 
Another  type  PR  filter  banks  based  on  algebraic  formulation 
are  proposed  by  Pinchon  [12].  The  filter  bank  can  be 
factorized  with  a  cascade  of  N  blocks.  However, 
unfortunately,  only  linear  phase  filter  banks  were  discussed. 
Actually,  a  general  factorization  of  PR  filter  banks  can  be 
used  with  a  lifting  scheme.  The  lifting  scheme  first  proposed 
by  Donoho  [13]  is  also  available  for  designing  a  structurally 
PR  filter  banks.  An  important  advantage,  however,  is  that  it 
can  also  be  used  in  biorthogonal  wavelet.  Early  progress  in 
lifting  scheme  has  been  focused  on  the  design  of  discrete 
wavelet  transform  or  two  band  subband  filtering.  Such  lifting 
scheme  utilizes  the  Euclidean  algorithm  for  polynomial  in 
factorization.  Vetterli  [14]  employed  the  Euclidean  algorithm 
and  the  close  connection  between  Diophantine  equations  and 
the  PR  conditions  to  parameterize  all  solutions  of  highpass 
filters  with  a  given  lowpass  filter.  Daubechies  [15]  applied 
the  lifting  scheme  to  design  the  discrete  wavelet  filter  bank. 
This  factorization,  which  is  based  on  the  lifting  scheme,  is 
also  used  for  the  general  two-channel  PR  filter  banks  if  the 
determinant  of  the  polyphase  matrix  is  equal  to  constant 
multiplies  of  signal  delays.  It  can  be  used  to  convert  a 
numerically  optimized  nearly  PR  filter  bank  into  a 
structurally  PR  system.  But  there  possibly  exist  non-causal 
polynomial  and  two  analysis  filters  have  the  similar 
frequency  response  after  factorization. 

In  paper,  an  improvement  of  the  factorization  to  solve  the 
problems  is  proposed.  The  procedures  of  deriving  are  also 
given.  The  paper  is  organized  as  follows:  the  formerly 
factorization  method  for  two-channel  filter  bank  is  described 
in  Section  2.  The  improvement  algorithm  of  the  factorization 
and  some  constrained  problems  are  addressed  in  Section  3. 


Design  procedures  and  several  examples,  including  linear 
phase  and  low  delay  filter  banks,  are  given  in  Section  4. 
Finally,  the  conclusions  are  drawn  in  Section  5. 

II.  Factorization  of  Two-Channel  FIR 
Filter  Banks  Using  Lifting  Scheme 

Consider  the  basic  structure  of  a  two-channel  FIR  filter  bank 
with  analysis  filters  {//0(z),  Hy  (z) )  and  synthesis  filter 
{  F0(z),  Fj(z)  )  in  Fig.  1. 


Figure  1 .  The  two-channel  Filter  bank 


The  relationship  between  the  input  x(z)  and  the  output  x(z) 
is  given  by 

x(z)  =  F(z)j:(z)  +  A(z)*(-z)  (1) 

where  7'(z)  =  4-[W0(z)F0(z)  +  Hl(z)F,(z)]  and 


A(z)  =  A[//0(-z)F„(z)  +  Ht  (-z)F, (z)] . 

Setting  F0(z)  =  -H,(-z)  and  F,(z)  =  ff0(-z) ,  the  aliasing 
term  A(z)  is  equal  to  zero.  The  condition  to  achieve  perfect 
reconstruction  with  FIR  synthesis  filters  after  a  FIR  analysis 
section  can  be  expressed  as: 

^oo(z)^nU)-H0|(z)//[o(z)  =  /i-z  d ,  (2) 

which  is  called  Bezout  identities  [5],  where 
H0(z)  =  H00(z2)+z-'H0I(z2),  Hi(z)  =  Hl0(z2)  +  z~] H n(z2) , 
(3  is  some  constant,  and  system  delay  parameter  d  is  some 
integer.  In  general,  the  design  problem  of  the  two-channel  PR 
filter  bank  is  formulated  as  a  constrained  non-linear 
optimization  problem  but  not  being  robust  to  coefficient 
quantization.  One  problem  with  the  constrained  optimization 
approach  is  that  the  filter  bank  is  not  completely  PR.  A 
method  to  solve  this  problem  is  factorization  using  lift 
scheme  shown  as  follows. 

Let  the  analysis  FIR  filters  be  H0(z)  =  7  hp(n)z~n  and 

n= 0 


fc1 

H,<z.)  =  /  h]  (n)z  "  .  Without  loss  of  generality,  suppose  that 

77=0 


|tf,(z)|>|tf0(z)| 
polynomial,  is 

H(z)  =  ^ h(n)z~ " 

n=P0 


( |  H(z)  | ,  the  degree  of  a  Laurent 
defined  as  \H(z)\=Pi~P0  if 

).  Hl0(z)  and  Hu(z)  can  be  expressed  as 


Hl0(z)=Hl0(z)-Q(z)H00(z), 


Hu(z)  =  Hn(z)-Q(z)Hm(z)  ,  (3) 

where  Hl0(z)  and  Hu(z)  are  satisfied  Eq.(2)  and  Q(z)  is 
some  real  polynomials.  Therefore,  all  the  highpass  filters, 
H\{~) ,  that  are  “complementary”  to  the  lowpass  filter 

H0(z),  are  given  by  Hl(z)  =  H1(z)-Q(z2)H0(z) .  Using 
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the  Euclidean  algorithm  on  Hqq  (z) 
following  matrix  product 


■»oo(z)'  = 

-prkk) 

r 

~K 

_bfol(z)_ 

11  1 

/=]  L 

0 

0 

and  //ol(z),  one  gets 
where  p,(z)  are 


polynomials,  K  is  a  non-zero  constant,  and  n  is  some  integer. 
The  particular  solution  /? , 0 ( r. )  and  Ht ,  (z)  can  be  obtained 


by  constructing  the  following  polyphase  matrix  P(z)  with 
determinant  1. 


P(z)  = 


»oi(z) 


n 

n 


;=i 

so  we  have 


«oo(z)  «io(z) 
Hu(z) 

A 

0  (-1)"; 


<7;(r) 

1 


’/3/A' 


(4) 


'ffoo(z) 

«10  (Z)' 

Hqo(z, 

)  Hi0(z) 

'l 

-Q(z) 

«oi(z) 

Hn{z) 

H0l(z] 

)  »u(s). 

0 

1 

n 

TT 

?;(")  1 

'A  0  Jl  -Q(z) 

11 

;=i 

1  0 

0  (-1)"  z~d p/  k\o  1 

Therefore,  the  two-channel  filter  bank  H0(z)  and  H[(z)can 
be  factorized  in  to  some  q,(z)  and  constants.  The  advantage 
of  this  factorization  is  that  is  robust  from  coefficient 
quantification.  But  since  the  factorization  is  implemented  by 
simple  long  division  algorithm  for  Laurent  polynomial,  there 
exist  two  problems:  first  problem  is  that  H,(z)  may  be  a  low 
pass  filter  by  Eq.(4)  and  //,(-)  from  Eq.(3)  is  not  always 
high  pass  filter  when  H0(z)  and  Ht(z)  have  same  length; 
second  problem  is  the  <?,•(;)  may  be  non-causal. 


HI.  IMPROVEMENT  OF  THE 
FACTORIZATION  ALGORITHM 


Euclidean  algorithm  is  based  on  the  long  division  for  Laurent 
polynomial  [14-17],  Let  us  analysis  the  long  division. 
Consider  two  causal  Laurent  polynomials  a0(z)  and  b0(z) 

with  |a0(z)|>|f>o(c)|7  assume  that  their  first  coefficients  of 

Z°  are  non-zeros,  then  there  always  exists  a  Laurent 
polynomial  qx(z)  (the  quotient),  and  a  Laurent  polynomial 

r0  ( - )  (the  remainder)  with  |'o(")|<|/'(>(")| ,  so  that 


a0(z)  =  ql(z)b0(z)  +  r0(z).  (5) 

The  quotient  <?]  (z)  and  the  remainder  r0(z)  can  be 
calculated  by  long  division.  If  qy(z)b0(z)  has  to  match  the 
beginning  or  end  terms  of  a0(z) ,  the  division  is  called  fore- 
long  division  or  back-long  division,  respectively. 

If  the  fore-long  division  is  taken,  then 


a  o(z) 
L  Vz). 


<7i(~) 

1 


-k 


0 


bo(z) 


or  If  the  back-long  division  is  used,  then 


a  0(z) 
b0(z) 


q  i(z) 

l 


b0(z) 

r0(z) 


(6) 


(7) 


where  k  =|  q{  (z)  | +1 ,  and  the  coefficient  of  z°  in  r0(z)  is 
nonzero.  Hence,  the  division  is  non-unique.  Based  on  the 
above  study,  a  fully  PR  two-channel  filter  banks  H0(z)  and 

Hl(z)  can  be  factorized  as  follows: 


0 


(:)  1 

1  0 


A 

0 


n 

FT 

V/,(z) 

r 

-  Q(  z) 

11 

>■=/+' 

1 

0 

(-!)"“'  /3/ A 

(8) 


Since  H0(z )  and  7/,(z)  are  constrained  by  Eq.(2),  the 
numbers  of  taking  back -long  division  j  for  factorization  are 
also  constrained. 

Proposition:  Suppose  that  there  are  j  terms  of  q(z)  which 
are  obtained  by  fore-long  division  and  n  -  j  terms  of  q(z) 
by  back -long  division  after  factorization,  then  number  j  and 
I  </,(  ~)  I  {/  =  1,  -  -  - ,  y}  are  satisfied  the  following  constrained 


condition. 


j 

2>,.(z)  |+1)  =  </.  (9) 

/=i 

Proof:  We  define  D_\=H m  ,  D0  =  H0\ ,  A_i  ~  H iq  and 
A(  =//]].  In  this  notation  (2)  becomes 


D_l(z)A<)(z)- A_l(z)D0(z)=  P  z  d  ■  (10) 

Now  use  fore-long  division  starting  with  the  pair 
D^(z).D0(z)  .  The  first  step  gives 


D_l(z)  =  ql(z)D0(,z)  +  z~k'  £>,(:). 

Also  do  one  division  of  the  pair  A_](z),  A0(z)  ■  denoting  the 


remainder  Aj(z),  A_j(z)  =  />](;)/)<)(-)  + A, (z)  . 

The  A,(z)  and  £>,( z)  are  nonzero  polynomial  in  z°  ■  If  we 
choose  |/>i(z)hki<:)U  then  /,  =  k,  =  |  <7 ,  (z)  |  +  1  • 
Together  these  equations  give 
P  z~d  =D_1(z)A0(z)-A.1(z)£>0(  z) 


=  (<7j(z)  -  p,  (z))A0(z)D0(z) 

+  z"A|(D,(z)A<)(:)-A1(z)Z)0(z)). 

Since  the  coefficient  of  A0(z)£>0(z)  in  z°  is  nonzero,  we 


must  have  Pi(z)  =  (?i(z) .  and  hence 

D0(z )A,  (z)  -  Ao(z)Z),  (z)  =  -P  z~id~kl) 

Since  this  is  of  the  same  form,  but  of  lower  degree,  than  the 
equation  that  we  started  with  (10),  we  can  compare  the 
second  step  of  this  factorization  with  the  fore-long  division  of 
AofzkAjlz)  and  this  gives  p2(z)  =  q2(z)  when 
d  -kl  -k2  >0  .  The  result  is  that  we  get  a  succession  of 
Bezout  identities 

DH(z)Aj(z)-AH(z)Dj(z)  =  (-\)JPz  , 

which  are  of  decreasing  degree.  We  find  in  turn  that 

j 

pl(z)  =  qi(z),—  Pj(z)  =  qj(z).  When  d  =0, 


Dj_l(z)Aj(z)-AH(z)Dj(z)  =  (-l)jP  ■  It  is  clear  that  if 

the  fore-long  division  is  continually  used  at  this  time,  then 
p j+i(z)  * q j+\(z) .  The  back-long  division  must  be  used 

instead.  We  have 

Dj- t(z)  =  qJ+l(z)Dj(z)  +  Dj+](z) ,  and 

Aj_,  ( z)  =  pj+i  ( z) Aj  ( z)  +  Aj+i  ( z) ,  where 

| Dj (z) N 77y+i (z) | , |  Ay(z)  1>| AJ+I (z) | .  The  AJ+l(z)  and 
D j+j(z )  are  also  nonzero  polynomial  in  z° . 
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At  same  way,  let  |  pJ+]  ( z)  |=l  <1J+ 1  (z)  I . 

(-1)' ’  P  =  Dj-i  (z)Aj(z)  -  Ah  (. z)Dj (z) 

=  (qj+ 1  (?)  -  Pj+i(z))Aj  (z)Dj(z) 
+(Dj+1(z)Aj(z)-Aj+1(z)Dj(z)). 

Since  |A/z)D/(z)|>|£/+i(z)A_,-(z)|  and 

|A7-(z)Dy-(z)|>|I)/+|(z)AJ.(z)|,  we  must  have 
pj+l(z)  =  qj+l(z). 

The  rest  can  be  deduced  by  analogy  till  £>„_1(z)  is 
monomial  (a  constant  K ),  namely  |  (z)  |=  0  .  We  have 

D„_2(z)  =  qi,(z)Dn_l(z)  and  D„(z)  =  0.  Substituting  them 
in  to  D„_2  (z)A„_! (z)  -  A„_2  (z)Dn_i  (z)  =  (-l)"-1  p  ,  We 
get  A„_2  =  qn (z)A„_j ( z)  +  (— l)"”1  P / Dn_x(z) .  Now,  let 


Q(z)  =  -A„_j  (z) ,  K  =  Dn_x  (z) ,  the  whole  factorization  can 
be  formulated  matrix  form  as. 
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Actually,  we  can  also  use  back-long  division  in  the  first  step 
of  factorization.  The  order  of  fore-  or  back-division  to  be 
taken  can  be  arbitrarily  arranged  during  the  procedures  of 
factorization. 

IV.  Procedure  of  Design  and  Examples 

The  procedure  of  the  factorization: 

(1)  First  step  is  to  design  a  nearly  PR  Filter  banks  {  H0(z.) 

and  H[(z) }.  Note  that  d  should  be  equal  to  and  less 
than  (1/4)  •  (JV0  +  TV, ) — 1 ,  respectively,  for  linear-phase 
and  low-delay  filer  banks,  where  N0  and  A'0  are  the 
length  of  H0(z)  and  H\(z) .  The  design  problem  can 
be  formulated  as  a  constrained  non-linear  optimization 
problem,  which  can  be  solved  by  the  NCONF/DCONF 
subroutine  in  the  IMSL  library.  The  pass  band  ripples 
and  the  stop  band  attenuation  of  the  low  pass  and  high 
pass  filters  can  be  minimized,  and  under  PR  condition 
constrained  an  objective  function  4>neaj.  to  be 
minimized  is  as  follows: 

Ccop0  .  |2  rn  .  •  |2 

min<P,iear=<7  I  (1- |ffo(e;,B)  )'</«  +  (1-<T)  I  l//0(c'J“)|  d(0 

h  J0  I  I  Jmso  I  I 

J'COs,  |  |2  fJT  I  |2  ^ 

\H[{e1C0)\  d(o  +  {\-c)\  Q.-\H[(e],0)\  )2dco 

0  I  I  Jcop]  I  I 

subjected  to  the  PR  condition  in  (2).  (12) 

Here,  h  is  the  vector  containing  the  impulse  responses 

of  H Q(z)  and  H[(z) ;  o  is  a  weighting  constant  from  0 

to  1  which  is  used  to  control  the  relatively  important  of 

the  error  in  the  stop  band  and  pass  band;  a>Po  and  copi 

are  the  pass  band  cut-off  frequencies  of  H0(z)  and 

H[(z) ;  ojSo  and  ojSi  are  the  stop  band  cut-off 

frequencies  of  H0(z)  and  H'(-) 


(3)  For  nearly  PR,  <:/;(-)  and  p,  (z)  so  obtained  by  step  (2) 
are  not  exactly  same.  Substituting  p,  (z)  with  q-,(z.) 
into  Eq.(13),  a  new  filter  (z)  can  be  obtained  by 
Eq.(ll).  H0(z)  and  H^z)  will  constructed  a 
structurally  PR  filter  bank. 

Using  above  factorization,  we  can  find  that  c/;(z)  is  always 
causal  so  long  as  H0(z )  and  H^z)  are  causal.  And  H1(z) 
can  be  guaranteed  to  be  highpass  filter  because  its  frequency 
response  is  resemble  to  that  of  H[  (z) . 

We  now  present  several  design  examples.  The  first  one  is  a 
low-delay  PR  FB  with  length,  Nfj  =  N j  =24,  where  Nt  is  the 
length  of  the  filter  H0(z) .  The  factorization  matrix  is  as 
follows. 

ffiO«  Hooiz.)  A  qt(z)  rA  <h(z)  ljA  -Q(z) 

Hlt(Z)  H0,(z)J“ll[  1  0  1  oJ[o  (-1  )"-'p/K 

The  system  delay  is  15  ( d  =  l )  samples.  Fig.  2  plots  the 
frequency  response  of  its  analysis  banks.  Frequency 
responses  of  the  optimized  filters  before  and  after 
factorization  are  shown  in  dashed  and  solid  lines,  respectively. 
They  are  fairly  close  to  each  other.  The  coefficients  of  the 
filter  bank  are  shown  in  Table  1.  The  second  one  is  a  linear 
phase  filter  bank  with  length  N0  =  Nt  =  32.  It  is  factorized  as 
follows  matrix. 

Hio (z)  tfoo(z)]_THr  q,(z)  z^lrT-kW  i"p  ~Q(z) 

»,i(z)  «oi(*)J  in.  1  0  Ml  1  01°  (-d"_1/3/at 

Their  frequency  responses  of  the  optimized  filters  before  and 
after  factorization  are  shown  in  dashed  and  solid  lines, 
respectively  in  Fig  3  and  Their  coefficients  is  listed  in  Table  2. 

V.  CONCLUSION 

In  this  paper,  an  improvement  of  factorization  technique 
approach  to  design  the  two-channel  filter  bank  is  presented.  It 
can  avoid  some  problems  in  the  formerly  factorization.  The 
design  results  suggest  that  a  structurally  PR  filter  bank  with 
good  frequency  characteristics  can  be  obtained. 
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Fig.  2  Frequency  responses  for  the  low  delay  filter  bank  with 
length  24:  before(solid  line)  and  after  factorization  (dashed  line) 


Table  1 :  The  coefficients  of  low  delay  filter  bank  with  length 
24  after  factorization 
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Fig.  3  Frequency  responses  for  linear  phase  filter  bank  with  length 
32  before(solid  line)  and  after  factorization  (dashed  line) 


Table  2:  The  coefficients  of  linear  phase  filter  bank  with 
_ length  32  after  factorization _ ^ _ 
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ABSTRACT 

In  this  paper,  we  propose  a  novel  subband  adaptive 
broadband  beamforming  architecture  based  on  the  gen¬ 
eralised  sidelobe  canceller  (GSC),  in  which  we  decom¬ 
pose  each  of  the  tapped  delay-line  signals  feeding  the 
adaptive  part  of  the  GSC  and  the  reference  signal  into 
subbands  and  perform  adaptive  minimisation  of  the 
mean  squared  error  in  each  subband  independently. 
Besides  its  lower  computational  complexity,  this  new 
subband  adaptive  GSC  outperforms  its  fullband  coun¬ 
terpart  in  terms  of  convergence  speed  because  of  its  pre¬ 
whitening  effect.  Simulations  based  on  different  kinds 
of  blocking  matrices  with  different  orders  of  derivative 
constraints  are  presented  to  support  these  findings. 

1.  INTRODUCTION 

Adaptive  beamforming  has  found  many  applications  in 
various  areas  ranging  from  sonar  and  radar  to  wireless 
communications.  It  is  based  on  a  technique  where,  by 
adjusting  the  weights  of  a  sensor  array  with  attached 
filters,  a  prescribed  spatial  and  spectral  selectivity  is 
achieved.  Fig.  1  shows  a  beamformer  with  M  sensors 
receiving  a  signal  of  interest  from  the  direction  of  ar¬ 
rival  (DOA)  angle  %). 


Fig.  1:  A  signal  impinging  from  an  angle  d  onto  a  beam- 
former  with  M  sensors. 


To  perform  beamforming  with  high  interference  re¬ 
jection  and  resolution,  arrays  with  a  large  number  of 
sensors  and  filter  coefficients  have  to  be  employed.  To 
facilitate  real-time  implementation,  various  methods 
are  employed  to  reduce  the  computational  complex¬ 
ity,  such  as  the  partially  adaptive  beamforming  [1], 
wavelet-based  beamforming  [2]  and  subband  beamform¬ 
ing  [3].  In  the  latter,  the  received  sensor  signals  are  first 
split  into  decimated  subbands,  then  an  independent 
beamformer  is  applied  to  each  subband.  The  advan¬ 
tage  arises  from  the  processing  in  decimated  subbands, 
although  at  the  expense  of  having  to  project  constraints 
into  the  subband  domain  as  well. 

We  here  focus  on  a  linearly  constrained  minimum 
variance  (LCMV)  beamformer,  which  can  be  efficiently 
implemented  as  a  generalized  sidelobe  canceller  (GSC) 
[4,  5].  Different  from  [3],  instead  of  performing  beam¬ 
forming  in  subbands  by  decomposing  the  input  sen¬ 
sor  signals,  we  employ  subband  adaptive  filtering  tech¬ 
niques  for  the  adaptive  process  of  the  GSC  structure 
only.  Specifically,  noting  that  there  are  in  total  M  —  S 
input  tapped  delay-lines  for  the  adaptive  part  of  the 
GSC,  we  decompose  each  of  the  tap-delay  line  signals 
and  the  reference  signal  d[n)  into  K  subbands  by  a 
A-channel  filter  banks  as  shown  in  Fig.  3  and  perform 
adaptive  minimisation  in  each  subband.  Simulation  re¬ 
sults  with  different  blocking  matrices  and  different  or¬ 
der  of  derivative  constraints  show  that  this  new  method 
outperforms  the  fullband  counterpart  in  addition  to  its 
very  low  computational  complexity. 

The  rest  of  this  paper  is  organised  as  follows:  Sec¬ 
tion  2  is  a  brief  review  of  GSC-based  broadband  beam¬ 
forming  based  on  a  generalized  sidelobe  canceller  with 
derivative  constraints.  In  Section  3,  we  introduce  the 
proposed  subband-based  GSC  structure.  Simulation 
and  results  will  be  given  in  Section  4  and  conclusions 
are  drawn  in  Section  5. 
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X„  k 


Fig.  2:  Structure  of  a  generalized  sidelohe  canceller. 


analysis  filter  bank  synthesis  filter  bank 


Fig.  3:  K  channel  filter  banks  with  decimation  N. 


2.  GENERALIZED  SIDELOBE 
CANCELLER 


An  LCMV  beamformer  performs  the  minimization  of 
the  variance  or  power  of  the  output  signal  with  respect 
to  some  given  spatial  and  spectral  constraints.  For  a 
beamformer  with  M  sensors  and  J  filter  taps  following 
each  sensor  as  shown  in  Fig.  1,  the  output  e[n]  can  be 
expressed  as: 

e[n]  =  wH  •  x„  (1) 


where  coefficients  and  input  sample  values  are  defined 


w  =  [wj  w[  ...  w]Lj]H  (2) 

w ,  =  [wo[l]  u>i[l]  ...  icm-i[/]]T  (3) 

x„  =  [x£  x£j  ...  x^_,/+1]T  (4) 

x„  =  [.T0[n]  xi[n]  ...  iamW]T  •  (5) 


B  and  a  quiescent  vector  wq.  Thereafter,  standard  un¬ 
constrained  optimisation  algorithms  such  as  least  mean 
square  (LMS)  or  recursive  least  squares  (RLS)  algo¬ 
rithms  can  be  invoked  [8].  Fig.  2  shows  the  principle 
of  a  GSC,  where  the  desired  signal  d[n]  is  obtained  via 
wq, 

d[n]=  w”-x„  with  wq  =  C(CHC)-If  .  (8) 

np 

The  input  signal  u„  =  [«0[r?.]  Mj[n]  ...  ma/_.s-i[»]] 
to  the  following  multichannel  adaptive  filter  (MCAF) 
is  obtained  by  u„  =  BHx„,  whereby  the  M  x  ( M  —  S) 
blocking  matrix  B  must  satisfy 

CHB  =  0  where  C  =  [c0  .  (9) 

In  the  next  section,  we  will  focus  on  the  multiple-input 
optimisation  process  and  introduce  our  subband  adap¬ 
tive  GSC  structure  by  employing  the  subband  adaptive 
filtering  techniques. 


The  data  vector  x„  is  a  time  slice  as  given  in  Fig.  1. 
A  coefficient  wm  [ l ]  is  defined  to  sit  at  the  tap  position 
l  of  the  mth  filter  The  LCMV  problem  can  now 
be  formulated  as  [6] 

minwHRa,.rw  subject  to  CHw  =  f  ,  (6) 

W 

where  RIit  is  the  covariance  matrix  of  observed  array 
data  in  xn,  C  €  cMJxSJ  is  a  constraint  matrix  and  f  £ 
CSJ  is  the  constraining  vector.  The  constraint  matrix 
here  imposes  derivative  constraints  of  order  5—1  [7], 


with  C,-  = 


with  c i  =  [(—mo)’  (1  —  mo)'  •••  {M  —  1  —  mo)1] 
and  a  phase  origin  point  mo- 

The  constrained  optimisation  of  the  LCMV  prob¬ 
lem  in  (6)  can  be  conveniently  solved  using  a  GSC. 
The  GSC  performs  a  projection  of  the  data  onto  an 
unconstrained  subspace  by  means  of  a  blocking  matrix 


3.  SUBBAND  ADAPTIVE  GENERALIZED 
SIDELOBE  CANCELLER 

Subband  decompositions  for  adaptive  filtering  applica¬ 
tions  are  commonly  based  on  oversampled  modulated 
filter  banks  (OSFB)  as  shown  in  Fig.  3  ,  where  the  in¬ 
put  signal  is  divided  into  K  frequency  bands  by  analysis 
filters  and  then  decimated  by  a  factor  N.  Due  to  over- 
sampling,  i.e.  N  <  K,  a  low  alias  level  in  the  subband 
signals  can  be  achieved.  This  is  important  since  alias¬ 
ing  will  limit  the  performance  of  an  subband  adaptive 
filtering  (SAF)  system  [9].  Due  to  its  lower  update  rate 
and  fewer  coefficients  to  represent  an  impulse  response 
of  a  given  length,  the  subband  implementation  only 
necessitates  K/N'2  ( K/N 3)  of  the  operations  required 
for  a  fullband  adaptive  algorithm  with  a  complexity  of 
0(La)  ( 0{L'l) ),  where  La  is  the  total  number  of  coef¬ 
ficients  in  the  fullband  realisation  [3]. 

When  applying  SAF  techniques  to  the  MCAF  in  the 
GSC  structure  in  Fig.  2,  the  subband  setup  as  shown 
in  Fig.  4  arises.  There,  the  blocks  labelled  A  perform 
an  OSFB  analysis  operations,  splitting  the  signal  into 
K  frequency  bands  each  running  at  an  N  times  lower 
sampling  rate  compared  to  the  fullband  input  to  the 


592 


Fig.  4:  Subband  adaptive  GSC;  an  independent  MCAF 
is  applied  to  each  subband. 


block.  Within  each  subband,  an  independent  MCAF 
is  operated,  and  a  synthesis  filter  bank,  labelled  S,  re¬ 
combines  the  different  subsystem  outputs  to  a  fullband 
beamformer  output  e[n]. 

In  addition  to  the  lower  computational  complex¬ 
ity  of  this  subband  adaptive  GSC,  it  promises  faster 
convergence  speed  for  LMS-type  adaptive  algorithms 
because  of  the  pre- whitening  effect  of  the  input  signal. 
Next,  we  will  give  some  simulation  results  to  demon¬ 
strate  the  performance  of  our  subband  adaptive  GSC. 

4.  SIMULATIONS  AND  RESULTS 

In  our  simulation,  we  use  a  beamformer  with  M  =  15 
sensors  and  J  —  60  coefficients  for  each  attached  filter. 
Each  of  the  input  signals  u;[n]  (i  =  0, 2,  •  •  •  ,  M  -  5- 1) 
and  the  reference  signal  d[n]  are  divided  into  K  =  8 
subbands  by  an  oversampled  GDFT  filter  bank  [10] 
with  decimation  factor  N  =  6  as  characterised  in  Fig.  5. 
This  subband  adaptive  GSC  is  constrained  to  received 
a  signal  of  interest  from  broadside,  which  is  white  Gaus¬ 
sian  with  unit  variance.  The  beamformer  should  adap¬ 
tively  suppress  a  broadband  interference  signal  cov¬ 
ering  the  frequency  interval  12  =  [0.257T;  0.757 r]  from 
i)  =  30°  and  with  a  signal-to-interference  ratio  (SIR) 
of  -24  dB.  The  sensor  signals  are  corrupted  by  additive 
Gaussian  noise  at  an  SNR  of  20  dB. 


normalised  angular  frequency  Q/n 


Fig.  5:  Magnitude  response  of  K  =  8  channel  filter 
bank  decimated  by  N  =  6. 


Fig.  6:  Learning  curves  for  simulation  I  (5  =  2). 


Fig.  7:  Learning  curves  for  simulation  II  (5  =  2). 


In  order  to  compare  the  performance  of  our  subband 
method  with  its  fullband  counterpart,  we  give  four  ex¬ 
amples  based  on  two  commonly  used  approaches  for 
building  the  blocking  matrix,  each  with  two  different 
orders  of  constraints.  The  first  approach  is  based  on 
the  cascaded  columns  of  difference  (CCD)  method  [11], 
the  second  on  a  singular  value  decomposition  (SVD)  [5]. 
The  four  examples  are:  (I)  SVD  method  with  first  order 
derivative  constraints  (5  =  2),  (II)  CCD  method  with 
5  =  2,  (III)  SVD  method  with  zero  order  derivative 
constraints  (5  =  1),  (IV)  CCD  method  with  5=1. 

The  step  size  in  the  NLMS  adaptation  for  the  first 
two  examples  is  set  to  /t  =  0.30,  and  to  p  =  0.20  for  ex¬ 
amples  (III)  and  (IV).  Simulation  results  for  these  four 
cases  are  shown  in  Fig.  6  to  Fig.  9,  respectively.  As  a 
performance  criterion,  these  figures  display  the  ensem¬ 
ble  mean  square  value  of  the  residual  error,  which  is 
defined  as  the  difference  between  the  beamformer  out¬ 
put  e[n]  and  the  appropriately  delayed  desired  signal 
received  from  broadside. 
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Fig.  8:  Learning  curves  for  simulation  III  (S  =  1). 


Fig.  9:  Learning  curves  for  simulation  IV  (5  =  1). 


From  these  results  we  can  see  that  the  subband 
adaptive  method  always  has  a  faster  convergence  speed 
because  of  its  pre-whitening  effect.  Comparing  Fig.  6 
with  Fig.  7  and  Fig.  8  with  Fig.  9,  we  see  the  fullband 
performance  changes  according  to  different  building  of 
the  blocking  matrix,  whereas  the  subband  method  has 
a  relatively  uniform  performance  independent  of  set¬ 
tings.  With  the  added  benefit  of  its  low  computational 
complexity  due  to  processing  in  decimated  subbands, 
the  presented  subband  method  outperforms  the  tradi¬ 
tional  fullband  implementation. 

5.  CONCLUSIONS 

A  novel  subband  adaptive  Generalized  Sidelobe  Can¬ 
celler  for  broadband  beamforming  has  been  proposed. 
By  employing  subband  adaptive  filtering  techniques, 
the  computational  complexity  is  greatly  reduced.  More¬ 
over,  the  new  method  can  also  achieve  a  faster  conver¬ 


gence  speed  because  of  its  pre- whitening  effect.  Supe¬ 
riority  of  this  new  method  to  fullband  implementation 
has  been  demonstrated  by  four  examples  based  on  dif¬ 
ferent  approaches  for  the  blocking  matrix  and  different 
orders  of  derivative  constraints. 
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Abstract 

Fractional-delay  digital  filter  (FD-DF),  implemented  using  the 
Farrow  structure,  is  very  attractive  in  providing  online  tuning 
delay  of  digital  signals.  This  paper  proposes  a  new  method  for  the 
design  of  such  Farrow-based  FD-DF  using  sum-of-powers-of-two 
(SOPOT)  coefficients.  Using  the  SOPOT  coefficient 
representation,  coefficient  multiplication  can  be  implemented  with 
limited  number  of  shifts  and  additions.  Design  examples  show 
that  the  proposed  method  can  greatly  reduce  the  design  time  and 
complexity  of  the  Farrow  structure  while  providing  comparable 
phase  and  amplitude  responses. 

I.  Introduction 

Fractional-delay  digital  filters  (FD-DF)  are  very  useful  in 
delaying  signals,  which  is  required  in  many  applications  such  as 
software  radio,  digital  modems,  arbitrary  sampling  rate 
conversion,  time-delay  estimation,  etc.  The  Farrow  structure  [1]  is 
particularly  attractive  because  it  can  provide  variable  signal  delay, 
making  high-speed  online  tuning  feasible.  The  basic  principle  of 
the  Farrow  structure  is  to  approximate  the  impulse  response  of  an 
ideal  fractional-delay  digital  filter  with  delay  //  by  polynomial 
interpolation  from  the  impulse  responses  of  a  limited  set  of 
fractional-delay  digital  filters  with  delays  equally  spaced  within  a 
range  usually  chosen  to  be  pt  =  [-0.5.0.5].  To  implement  the 
Farrow  structure,  the  signal  will  pass  through  these  sub-filters  and 
multiply  with  the  appropriate  powers  of  d  to  produce  the  output, 
Figure  1.  The  number  of  sub-filters  required  is  equal  to  the  order 
of  the  polynomial  approximation  used  plus  one.  For  precise 
control  of  the  signal  delay,  the  length  of  these  sub-filters  and  the 
order  of  polynomial  approximation  will  be  considerable,  requiring 
a  large  number  of  multipliers  to  implement  this  structure.  As  a 
result,  higher  power  dissipation  and  larger  area  for  VLSI 
implementation  is  expected. 

In  this  paper,  a  novel  algorithm  for  designing  the  Farrow 
structure  with  sum-of-powers-of-two  (SOPOT)  coefficients  is 
proposed.  SOPOT  representation  of  filter  coefficients  is  an 
attractive  method  for  VLSI  or  hardware  implementation  of  digital 
filters  because  multiplication  of  SOPOT  coefficients  can  be 
implemented  efficiently  using  hard-wired  shifters  and  adders  only 
(i.e.  multiplier-less).  More  precisely,  all  the  coefficients  of  the 
sub-filters  in  the  Farrow  structure  are  represented  in  SOPOT  form 
and  are  implemented  as  additions  and  hardwired  shifts.  To  further 
reduce  the  number  of  adders  required  in  this  structure,  a 
transposed  form  of  the  sub-filters  in  the  Farrow  structure  is 
employed  which  allows  us  to  implement  all  the  SOPOT 
multiplications  with  a  single  multiplier-block  (MB)  [3].  The 
application  of  MB  to  the  efficient  implementation  of  interpolated 
filters  and  filter  banks  were  reported  in  [3].  Unfortunately,  the 
design  of  such  multiplier-less  Farrow  structure  was  missing. 
Using  MB,  the  redundancy  in  the  multiplication  of  the  input  with 
all  the  constant  multipliers  can  be  fully  explored  through  the  reuse 
of  the  immediate  results  generated.  In  principle,  it  is  possible  to 
remove  all  the  redundancy  found  in  the  constant  multipliers 
leading  to  a  significant  reduction  in  the  number  of  adders  required 
to  implement  the  Farrow  structure.  The  proposed  design  algorithm 
consists  of  two  different  steps:  A  FD-DF  filter  with  real-valued 
coefficients  is  first  designed  A  flexible  and  efficient  “random 
search”  algorithm  is  then  employed  to  search  for  the  SOPOT 
coefficients  while  minimizing  some  criteria  such  as  the  number  of 


SOPOT  terms  used  subjected  to  a  given  frequency  specification. 
This  random  search  algorithm  is  similar  to  the  mutation  of  genetic 
algorithm  (GA)  and  the  random  walk  in  stimulated  annealing. 
The  main  difference  here  is  that  we  have  limited  its  search  space 
to  a  small  neighborhood  of  the  real-valued  solution.  This  greatly 
shortens  the  search  time  to  a  few  minutes.  Our  experience  also 
indicates  that  excellent  SOPOT  solutions  can  be  obtained  in  a 
reasonably  time  even  when  the  filters  involved  are  HR.  This  is 
very  difficult  to  achieve  by  GA  even  with  design  time  several 
orders  of  magnitude  longer  [9].  The  latter  is  mainly  due  to  high 
sensitivities  of  the  poles.  Another  advantage  of  this  algorithm  is 
that  it  can  also  be  used  to  minimize  directly  the  hardware  cost 
such  as  adder  cells  of  the  filters,  taking  into  account  round-off  and 
overflow  noise  [9].  There  are  many  methods  for  designing  FD- 
DF  with  real-valued  coefficients  [1][4][8].  In  this  work,  the 
prototype  fractional  delay  filters  for  the  Farrow  structure  are 
designed  using  complex  Chebyshev  approximation,  which  is 
readily  available  in  MATLAB.  They  are  then  interpolated  to 
obtain  the  sub-filter  coefficients  for  generating  the  MB.  Design 
examples  show  that  more  than  half  of  the  adders  in  the  SOPOT 
coefficients  can  be  reduced  with  slight  or  negligible  degradation 
in  frequency  responses,  representing  significant  saving  in 
hardware  resources  and  power  dissipation. 

This  paper  is  organized  as  follows:  the  efficient  Farrow 
structure  with  SOPOT  coefficients  and  multiplier  block  is 
introduced  in  Section  II.  Its  design  algorithm  will  be  described  in 
Section  in  followed  by  several  examples  in  Section  IV.  Finally, 
conclusions  are  drawn  in  Section  V. 

II.  The  Efficient  Farrow  Structure 

As  mentioned  earlier,  one  problem  with  the 
implementation  of  variable  fractional-delay  digital  filters  is  the 
dependence  of  the  impulse  responses  of  the  FD-DF  on  the  delay 
parameter  n .  More  precisely,  the  output  of  the  FD-DF, 
y[(m  +  /u)T\ ,  is  given  by 

M0»  +  D  +  n)T]  =  £  4.{m  -  i)T]  ■  h,  (0 ,  ( 1 ) 

i=0 

where  r[m7]  is  the  input  signal  sampled  at  a  period  T  ,  hp  (i)  is 
the  FD-DF  with  delay  D  +  /j  and  D  is  an  integer  constant,  and  N 
is  the  length  of  hp  (i) .  To  avoid  the  implementation  of  a  large 
number  of  filters  with  different  delays,  Farrow  [1]  proposed  to 
approximate  each  impulse  response  hp  (i)  with  the  following  Prh 
order  polynomial  in  delay  value  /j  such  that  the  delay  control  is 
independent  of  the  filter  coefficients. 

M0  =  2>n( «>"  •  (2) 

n=0 

Substituting  (2)  into  (1)  gives 

y[(m  +  D  +  /r)r)  =  X 

n=0 

Figure  1  shows  the  Farrow  structure  for  implementing  equation 
(3),  where  the  input  signal  is  passed  through  a  number  of  sub¬ 
filters  b„(i),  n  =  0.....P ,  and  is  multiplied  by  the  appropriate 
powers  of  //  to  produce  the  output.  Though  the  Farrow  structure 
is  very  useful  in  providing  a  continuous  value  of  signal  delay,  it 
still  requires  large  number  of  multiplications  for  implementation, 


£x[(m-0r]-&„(0U 


(3) 
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especially  when  P  and  N  are  large  to  provide  very  precise  control 
of  the  frequency  characteristics  of  the  FD-DF.  One  method  to 
avoid  the  expensive  multipliers  is  to  convert  the  filter  coefficients 
in  the  following  SOPOT  representation 

*„(0  =  IX,(0-2'  ,  (4) 

*=! 

with  bnt(i)e{- 1,1}  and  ake{-l . -1.0.1...../},  where  /  is  a 

positive  integer  and  its  value  determines  the  range  of  the 
coefficients.  L  is  the  number  of  terms  used  in  the  coefficient 
approximation  and  is  usually  limited  to  a  small  number.  The 
coefficient  multiplications  can  therefore  be  implemented  as 
limited  shifts  and  additions,  resulting  in  a  significant  reduction  in 
implementation  complexity.  Very  often,  there  is  also  significant 
redundancy  in  these  SOPOT  coefficients,  which  appears  as 
common  sub-expressions  among  different  SOPOT  coefficients. 
Due  to  the  z-operator.  it  is  somewhat  difficult  to  remove  these 
sub-expressions  without  increasing  much  shift  registers. 
Fortunately,  thanks  to  the  transposed  form  of  the  sub-filters,  the 
Farrow  structure  can  be  rewritten  as  in  Figure  2.  In  this  new 
structure,  the  input  is  multiplied  to  a  number  of  constant 
coefficients.  Hence,  the  common  sub-expressions  within  the 
SOPOT  coefficients  can  be  eliminated  [5][6]  using  a  single 
multiplier-block,  which  further  reduces  the  complexity  of  the 
Farrow  structure. 

A  number  of  methods  were  proposed  for  designing  the 
Farrow  structure-based  FD-DF  [1],  Given  these  real-valued 
coefficients  of  the  Farrow  structure,  it  remains  to  determine  the 

SOPOT  coefficients  />„(/)  that  satisfy  a  given  specification  with 
the  minimum  number  of  adders  in  the  multiplier-block. 
Commonly  used  distortion  measures  are  the  least  squares  and  the 
minimax  criterion.  Without  loss  of  generality,  we  shall  employ 

the  minimax  criteria  in  this  paper.  Let  H(eJ“,fj)  and 
be  the  frequency  responses  of  the  real-valued  Farrow  structure 
and  its  SOPOT  counterpart,  then  the  design  problem  can  be  stated 
as  follows: 

Given  a  set  of  initial  Farrow  coefficients  bji) ,  the 
maximum  number  of  terms  L  in  each  coefficient  and  the  dynamic 
range  /  of  the  coefficients,  determine  the  SOPOT  coefficients 

bji)  such  that  the  maximum  value  of  phase  response  error  5 p  is 
minimized  subject  to  a  given  peak  amplitude  error  8a  <  s  ,  where 

arg{.//  -  aig{H  (eJ,J  ,/j)} 

~  > 

CO 

subject  to  £=  max  |f/(eJ“,/y)-tf(e‘'",//)|  <  s  .  (5) 

0<**<;r.j//j<0.5"  r 

III.  Design  Procedure 

The  design  of  Farrow  structure  is  first  designing  the 
prototype  filters  with  specific  fractional  delay,  then  through 
interpolating  these  prototype  filters  to  acquire  the  subfilters  of  the 
Farrow  structure.  The  Farrow  structure  prototype  filters  are 
designed  using  Complex  Chebyshev  Approximation  which  is 
readily  available  in  Matlab  as  cremez.  For  example,  if  the 
interpolation  order  is  3,  then  there  are  4  subfilters.  That  is  we  are 
required  to  design  a  batch  of  prototype  filters  all  with  equal  length 
(say  10  or  more)  with  frequency  response  of 

=  =  and  A,  =  -0.5  +  (/-l)/9  .  For  the  same 

impulse  coefficients,  these  will  then  be  interpolate  using  a  third 
order  polynomial  using  least-square.  Repeat  each  prototype  filters 
coefficients  for  this  interpolation  procedure  and  these  final 
polynomial  coefficients  are  the  initial  full-precision  Farrow 
structure  filter  coefficients. 


The  optimization  procedure  consists  of  two  stages.  First, 
the  SOPOT  coefficients  of  the  initial  Farrow  structure  such  that 
the  performance  measure  in  (5)  is  minimized  using  a  random 
search  technique.  Then,  the  minimum  number  of  adders  needed 
in  the  multiplier  block  is  determined.  The  generation  of  the 
multiplier-block  from  the  SOPOT  coefficients  follows  the 
algorithms  proposed  in  [3].  Let  b,  be  the  vector  containing  the 
initial  values  bn(i)  's  of  the  Farrow  structure.  The  principle  of  the 
random  search  algorithm  is  to  generate  random  candidate  SOPOT 
coefficients  in  the  neighborhood  of  b ,  so  as  to  search  for  the 
optimal  discrete  solution.  More  precisely,  a  new  coefficient 
vector  bNEW  is  generated  by  adding  to  it  a  random  vector  to  the 
original  coefficient  vector  b ,  as  follows 

/’v;  =  +  a  ‘  •  (6) 

where  a  is  a  scale  factor  which  control  the  size  of  neighborhood 
to  be  searched.  bR  is  a  vector  with  its  elements  being  random 
numbers  in  the  range  [-1,1],  and  |_‘Js<vot  *s  1116  rounding 
operation  which  convert  its  argument  to  the  nearest  SOPOT 
coefficients  with  maximum  number  of  terms  in  each  coefficient 
being  L  and  dynamic  range  1 .  The  performance  measures  S  and 

Sc  of  the  new  coefficients  are  then  calculated.  The  set  that  yields 
the  minimum  phase  error  with  a  given  peak  amplitude  error  s  is 
the  optimum  solution  under  the  constraints  of  L  and  / .  As  this  is 
a  random  search  algorithm,  the  longer  the  searching  time,  the 
higher  the  chance  of  founding  the  optimal  solution. 

There  are  several  advantages  of  this  algorithm.  First  of 
all.  with  the  computational  power  of  nowadays  personal  computer 
(PC)  the  time  for  obtaining  high  quality  solutions  is  manageable. 
In  fact,  for  the  problem  considered  here,  the  overall  design  time 
only  takes  less  than  5  minutes  to  complete  on  a  typical  Pentium- 
400  PC  using  Matlab  5.3..  including  both  the  design  of  SOPOT 
coefficients  and  the  multiplier-block  design.  Secondly,  it  is 
applicable  to  problem  with  general  objective  function  probably 
with  very  complicated  inequality  constraints.  Moreover,  a  set  of 
possible  solutions  representing  different  tradeoffs  between 
computational  complexity  and  performance  will  be  generated 
during  the  search.  Therefore,  it  helps  one  to  achieve  an 
appropriate  tradeoff  for  a  given  application.  It  is  also  possible  to 
combine  the  two  stages  together  to  improve  the  performance  but 
the  computational  time  will  be  greatly  increased. 

IV.  Design  Examples 

Example  1 

As  a  simple  example,  the  famous  cubic  Lagrange  interpolator  [7], 
with  coefficients  shown  in  Table  1.  is  implemented  using  the 
proposed  algorithm.  The  passband  bandwidth  under  optimization 
is  from  0  to  0.4,t  .  The  original  peak  ripple  error  and  phase  delay 
error  are.  respectively,  0.048769  and  0.008179.  By  multiplying 
all  the  coefficients  by  6.  they  can  be  converted  to  simple  integers. 
Using  the  concept  of  multiplier-block,  the  additions  in 
implementing  the  multipliers  3  and  6  can  be  shared  to  reduce  the 
total  number  of  adders.  The  final  Farrow  structure  implemented 
using  the  multiplier-block  is  shown  in  Figure  3.  (The  “»n"  sign 
in  Figure  3,  means  a  hard-wired  shift  towards  the  LSB  for  n-bit. 
As  for  the  “<<«”  sign,  it  means  a  hard-wired  shift  towards  the 
MSB  for  n-bit.)  It  requires  only  3  adders  including  the  scaling 
(1/6)  in  SOPOT  coefficients  at  the  output. 

Example  2 

As  another  example,  let’s  consider  the  coefficients  provided  by 
Farrow  in  [1].  The  SOPOT  coefficients  obtained  by  the  random 
search  algorithm  are  shown  in  Table  2.  The  bandwidth  under 
consideration  for  this  filter  is  from  0  to  0.6tt .  The  original  peak 
ripple  error  and  phase  delay  error  are  0.006271  and  0.0032. 


SD  =  max 

p  0<6>or.j/i|<0.5 
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respectively;  whereas  for  the  SOPOT  Farrow  structure,  the  peak 
ripple  error  and  phase  delay  error  are  0.005371  and  0.0046, 
respectively.  After  common  sub-expressions  elimination,  the 
multiplier-block  requires  only  13  adders  compared  favorably  with 
32  real  multiplications  in  the  original  Farrow  structure.  These 
results  show  that  the  number  of  adders  can  be  drastically  reduce 
by  using  multiplier-block.  The  resultant  Farrow  structure  filter  has 
a  much  lower  complexity  than  the  real-valued  Farrow  structure 
but  providing  nearly  the  same  phase  delay  and  amplitude 
response. 

Example  3 

Our  last  example  will  be  on  a  Farrow  structure  with  higher 
polynomial  order.  The  prototype  filters  are  designed  using 
complex  Chebyshev  approximation  and  interpolated  by  a  5th  order 
polynomial.  The  bandwidth  under  consideration  is  from  0  to 
0.75^ .  The  SOPOT  coefficients  are  shown  in  Table  3.  After 
common-expression  elimination,  the  sub-filters  require  only  18 
adders  to  achieve  a  peak  ripple  error  of  0.026376  and  maximum 
phase  delay  error  of  0.0059.  The  frequency  responses  of  the 
proposed  structure  and  its  real-values  counterpart  are  shown  in 
Figure  4.  It  can  be  seen  that  they  are  very  close  to  each  other. 
The  performance  and  arithmetic  complexity  of  the  various 
implementations  are  summarized  in  Table  4. 
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Table  1.  Coefficients  of  the  Lagrange-based  FD-DF. 
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Table  2.  SOPOT  coefficients  for  the  proposed  Farrow  structure. 
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Table  3.  SOPOT  coefficients  for  the  proposed  Farrow  structure. 
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Real-valued  (Multipliers) 

4 

32 

48 

SOPOT  (Adders) 

N/A 

48 

78 

Multiplier-block  (Adders) 

3 

13 

18 

%  of  adders  reduction 

25% 

72.92% 

76.92% 

Design  Time  used  on  Pentium- 
400  (Minutes) 

N/A 

4 

6 

Table  4.  Comparison  between  various  implementation  schemes. 


V.  Conclusion 

A  new  method  for  the  design  Farrow-based  FD-DF  using  sum-of- 
powers-of-two  (SOPOT)  coefficients  is  proposed.  This  method 
has  the  advantage  of  fast  design  time  with  good  frequency 
response  of  the  Farrow  structure  and  able  to  reduce  the  no.  of 
terms  of  SOPOT  coefficient  in  order  to  reduce  hardware 
complexity.  Design  examples  show  the  robustness  of  this  method 
for  designing  Farrow  structure  filters  with  different  specifications. 
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Figure  1.  Original  Farrow  structure. 
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Figure  2.  Proposed  implementation  of  the  Farrow  structure. 


Figure  3.  Farrow  structure  of  Lagrange  interpolator  in  Example  1. 
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Figure  4.  Frequency  and  phase  delay  responses  in  Example  3. 
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Abstract 

This  paper  proposes  a  novel  algorithm  for  the  design  and  hardware 
reduction  of  a  class  of  multiplier-less  two-channel  PR  filter  banks 
(FBs)  using  sum-of-powers-of-two  (SOPOT)  coefficient.  It 
minimizes  a  more  realistic  hardware  cost,  such  as  adder  cells, 
subject  to  a  prescribe  output  accuracy  taking  into  account  of  the 
rounding  and  overflow  effects,  instead  of  using  just  the  SOPOT 
terms  as  in  conventional  method.  Furthermore,  by  implementing  the 
filters  in  the  FBs  using  multiplier-block  (MB),  significant  overall 
saving  in  hardware  resources  can  be  achieved.  An  effective 
random  search  algorithm  is  also  proposed  to  solve  the  design 
problem,  which  is  also  applicable  to  PR  HR  FBs  with  highly 
nonlinear  objective  functions. 

I.  Introduction 

Perfect  reconstruction  (PR)  multirate  filter  banks  (FB)  have 
important  applications  in  signal  analysis,  signal  coding  and  the 
design  of  wavelet  bases.  A  number  of  techniques  for  designing 
linear-phase  and  low-delay  PR  two-channel  filter  banks  are  now 
available  [1][2][3].  Recently,  there  is  an  increasing  interest  in 
designing  PR  filter  banks  with  very  low  implementation  complexity. 
One  of  the  applications  is  to  provide  efficient  hardware 
implementation  of  the  9/7  wavelet  filter  for  the  JPEG2000  standard. 
FBs  using  sum  of  powers-of-two  (SOPOT)  coefficients  are 
particularly  attractive  for  VLSI  or  hardware  implementation  because 
multiplication  of  SOPOT  coefficients  can  be  implemented 
efficiently  using  hard-wired  shifters  and  adders  only  (i.e.  multiplier- 
less).  The  design  of  such  SOPOT  PR  FBs  using  the  2-channel 
lossless  lattice  structure  and  genetic  algorithm  was  studied  in  [6], 
Another  family  of  multiplier-less  PR  two-channel  FIR/IIR  FB  and 
wavelets,  using  SOPOT  coefficients  and  the  structure  in  [1],  was 
studied  recently  by  the  authors  in  [4]  [7],  They  are  attractive  because 
of  their  low  hardware  and  design  complexities.  Furthermore,  the  PR 
condition  is  structurally  imposed  and  is  robust  to  coefficient 
quantization. 

It  is  well  known  that  there  are  two  sources  of  error  in 
implementing  a  digital  filter:  coefficient  round-off  error  and  signal 
round-off  error  [10].  Coefficient  round-off  error  happens  when  the 
real-valued  coefficients  of  the  filter,  obtained  say  by  the  Park- 
McClellan  algorithm,  are  rounded  to  their  fixed-point 
representations  to  simply  the  hardware  implementation.  The 
frequency  response  of  the  filter  is  therefore  changed,  and  might  not 
satisfy  the  specification  any  more.  On  the  other  hand,  signal  round¬ 
off  error  occurs  when  overflow  occurs  due  to  insufficient  internal 
wordlength  and  improper  scaling;  and  when  rounding  is  performed 
for  long  intermediate  data  after  multiplications  with  the  filter 
coefficients.  Signal  round-off  error  is  usually  more  difficult  to 
handle  in  hardware  implementation  because  complicated  hardware 
for  detecting  overflows,  etc.,  would  significantly  slow  down  the 
throughput  of  the  system.  The  SOPOT  FBs  mentioned  above  are 
free  from  coefficient  round-off  noise  because  the  FBs  are  optimized 
using  the  SOPOT  coefficients  as  variables.  Unfortunately,  most  of 
these  methods  only  focused  on  minimizing  the  number  of  SOPOT 
terms  to  meet  a  given  frequency  specification,  and  pay  little 
attention  to  signal  round-off  error.  In  order  to  satisfy  a  given  output 
accuracy,  one  usually  employs  a  fixed  and  long  wordlength  for  all 
intermediate  data,  which  means  increased  hardware  complexity. 
Therefore,  the  design  problem  should  be  to  minimize  the  hardware 
complexity  of  the  system  while  satisfying  the  given  frequency 
specification  and  the  output  accuracy.  The  hardware  complexity 
could  be  the  number  of  adder  cells  and  registers  used  in  the  FBs, 
which  is  related  to  the  exact  wordlength  used  for  each  intermediate 
data.  The  output  accuracy  of  a  digital  filter  is  usually  specified 
statistically  by  its  output  noise  power  due  to  the  rounding  operations 
performed,  using  a  given  noise  model.  For  fine  quantization,  round¬ 
off  noise  is  usually  modeled  as  white  and  is  uncorrelated  with  the 
signal  and  other  noise  sources.  To  satisfy  a  given  output  accuracy 
(say  16-bit),  one  has  to  determine  the  appropriate  scaling  and 


wordlength  of  each  intermediate  data  to  avoid  signal  overflow  and 
to  achieve  a  noise  power  less  than  the  given  specification  (say  - 
96dB  for  16-bit  accuracy). 

The  purpose  of  this  paper  is  to  provide  a  solution  to  the  above 
problem  with  particular  emphasis  on  the  SOPOT  FBs  that  we  have 
proposed  in  [4][7],  This  class  of  PR  FBs  is  chosen  because  the 
required  stopband  attenuation  and  system  delay  can  easily  be 
achieved  using  simple  design  formula  for  order  estimation  and  the 
efficient  Park-McClellan  design  algorithm  Using  the  real-valued 
coefficients  so  obtained  as  initial  guess,  the  SOPOT  coefficients  and 
the  internal  wordlength  of  all  intermediate  data  are  jointly  optimized 
using  a  novel  random  search  algorithm  to  minimize  some  measure 
of  the  hardware  complexity,  while  satisfying  the  given  specification. 
In  this  work,  both  the  number  of  adders  and  their  adder  cells  are 
minimized  because  they  constituted  over  70%  of  the  total  hardware 
cost  as  compared  with  other  components  such  as  latches.  The 
random  search  algorithm  is  similar  to  the  mutation  of  genetic 
algorithm  (GA)  and  the  random  walk  in  stimulated  annealing.  The 
main  difference  here  is  that  we  have  limited  its  search  space  to  a 
small  neighborhood  of  the  real-valued  solution  obtained  in  [4]  [7] 
using  the  Park-McClellan  algorithm  This  greatly  shortens  the 
search  time  to  a  few  minutes.  Moreover,  for  HR  FBs,  excellent 
SOPOT  solutions  can  be  obtained  in  a  reasonably  time,  which 
cannot  be  achieved  by  GA  even  with  design  time  several  orders  of 
magnitude  longer.  The  latter  is  mainly  due  to  high  sensitivities  of 
the  poles.  The  number  of  adders  required  to  implement  the  SOPOT 
multiplications  is  further  reduced  by  using  the  technique  of 
“multiplier-block”  (MB)  [9],  By  using  MB,  redundancy  in  the 
SOPOT  coefficients  is  removed.  Design  examples  demonstrated 
that  our  design  method  is  very  efficient  and  capable  of  reducing 
dramatically  the  hardware  complexity  of  the  FBs,  while  meeting  the 
given  specifications.  More  difficult  2-channel  SOPOT  HR  PR  FBs 
can  also  be  designed  using  the  proposed  method.  Our  paper  is 
organized  as  follows:  in  section  II,  the  SOPOT  FBs  considered  and 
the  MB  technique  will  be  described.  The  round-off-noise  and 
overflow  problems  will  be  addressed  in  Section  HI.  Section  IV  is 
devoted  to  the  ‘Random  search’  design  algorithm.  This  is  followed 
by  several  design  examples  in  Section  V.  Finally  conclusions  are 
drawn  in  section  VI. 

n.  2 -Channel  PR  SOPOT  FB 
Fig.  1  shows  the  structure  of  the  PR  FB  proposed  in  [5],  The 
functions  a(z)  and  f(z)  can  be  linear-phase  FIR,  nonlinear-phase 
FIR,  or  HR  functions,  without  affecting  the  PR  conditions.  It  can  be 
shown  that  the  lowpass  and  highpass  analysis  filters  are  given  by 
#„(-')  =  +  z~'fl(z2))I2  and  HI(z)  =  -a(z2)Hfz)  + z'^' , 

respectively.  It  is  also  possible  to  realize  wavelet  bases  from  this 
FBs  by  imposing  certain  regularity  condition  on  Hfz)  and 

Hfz) .  Details  regarding  their  design  can  be  found  in  [4],  In  the 
multiplier-less  FB  [4][7],  each  coefficient  in  a(z)  and  /?(;)  is 
represented  as  the  following  sum  of  powers-of-two  coefficients 

L- 1 

(SOPOT)  or  canonical  signed  digits  (CSD),  b  =  "£at  ■  2“‘‘  ,  where 

at  is  either  1  or  -1,  and  bk  e{-/i,....l,0,...i„} .  The  larger  the 
numbers  1L,  lu  ,  and  L,  the  closer  the  SOPOT  approximation  will 
be  to  the  original  real  number.  In  practice,  the  number  of  non-zero 
terms  is  usually  kept  to  a  small  number  while  satisfying  a  given 
specification  so  that  the  multiplication  can  be  implemented  as  a 
limited  number  of  shift  and  add  (subtract)  operations,  giving  rise  to 
multiplier-less  realization.  Multiplier-less  filter  banks  and  wavelet 
bases  with  linear-phase  and  low  system  delay  can  be  obtained  from 
this  structure  by  searching  for  the  SOPOT  coefficients  using  the 
genetic  algorithm  [6][7],  As  mentioned  earlier,  the  number  of 
adders  needed  to  implement  a(z)  and  (3(z)  can  further  be  reduced 
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by  rewriting  them  in  transposed  form.  It  can  be  seen  that  instead  of 
multiplying  the  delayed  input  samples  with  the  filter  coefficients  as 
in  the  direct  form,  the  input  sample  is  now  multiplied  with  all  the 
coefficients.  This  can  be  efficiently  implemented  using  a  multiplier 
block  (MB)  [9].  Let’s  consider  a  simple  example  with  two  filter 
coefficients:  3  and  21.  The  SOPOT  representations  of  these  two 
numbers  are:  3  =  2'  + 1  and  21  =  2J  +  2:  + 1 .  This  requires  3  adders 
and  3  shifts.  If  implemented  in  a  MB,  the  multiplication  of  the  input 
with  the  coefficient  3  will  also  be  generated  by  decomposing  3  as 
2'  + 1 ,  requiring  one  addition.  The  multiplication  with  21,  however, 
can  be  simplified  by  re-using  the  intermediate  result  generated  by 
the  first  filter  coefficient  ‘3’  as  21  =  3-7  =  3-  (2"*  —  1) .  Actually,  the 
intermediate  result,  after  multiplication  by  3,  is  multiplied  by  7, 
which  requires  one  less  adder  than  generating  21  directly.  In 
principle,  it  is  possible  to  remove  all  the  redundancy  found  in  the 
constant  multipliers  leading  to  a  realization  with  the  minimum 
number  of  adders.  This  can  drastically  reduce  the  number  of  adders 
required  for  realizing  such  FBs  when  there  is  a  large  number  of 
filter  coefficients  to  be  implemented  in  the  transposed  form  FIR 
structure  (around  50%  in  our  example). 

m.  Round-Off  Noise  and  Overflow  analyses 
1,  Analysis  of  Round-off  Noise 

As  mentioned  earlier,  round-off  noise  occurs  when  rounding 
is  performed  during  arithmetic  computation.  In  fixed-point 
arithmetic,  round-off  operation  is  usually  performed  after 
multiplication  to  limit  the  wordlength  of  the  intermediate  data  in 
order  to  save  hardware  resources.  Round-off  error  is  thus  generated 
Due  to  the  difficulty  in  analyzing  exactly  the  rounding  error,  they 
are  usually  treated  as  white  random  process,  unconelated  with  the 
signal  and  other  noise  sources.  For  rounding  operation,  quantization 
noise  will  have  zero  mean  and  a  variance  cr'  =  A2  / 12 ,  where  A 

is  the  quantization  step-size,  which  is  determined  by  the  number  of 
fractional  bits  that  is  retained  after  multiplication. 

Consider  the  transposed  form  FIR  filter  in  figure  2.  The 
blocks  D  and  Q{.}  represent  respectively  a  register  and  the  round-off 
operator.  Any  signal  in  this  filter,  for  example  the  input  signal  x[n] , 
has  a  fixed-point  representation  of  the  form  <  n  |  m  > ,  which  means 
that  the  total  wordlength  is  n  +  m  bits  where  n  represents  the 
integer  bits  (including  the  sign  bit)  and  m  the  fractional  bits.  For 
notation  convenience,  any  signal  will  be  represented  as 
x[n ] :<n\m>,  meaning  that  it  has  n  integer  bits  and  m  fractional 
bits.  Now,  consider  the  input  sample  x[n]  :<  1 1  7  > ,  which  is  a  8-bit 
number  gated  into  the  digital  filter  at  every  clock  cycle.  It  will  be 
multiplied  by  A[0]  :<  1 1 9  >  and  /i[l]  :<  1 1 7  >  .  If  no  rounding  is 
performed,  the  fixed-point  formats  of  the  products  x[n]/?[0]  and 
x[n]/>[l]  will  be  <  1 1 16  >  and  <1|  14  >,  respectively.  Suppose  that 
the  products  are  rounded  by  the  operator  Q{.}  to  the  format: 

<  1 1 14  > .  Since  the  wordlength  of  x[n]A[l]  before  and  after 
rounding  is  equal,  so  there  is  no  round-off  noise  ( e_l[«]  =  0  ). 
While  for  the  signal  x[n]/i[0] ,  the  wordlength  is  shortened  from 

<  1  ]  16  >  to  <  1  ]  14  > ,  hence,  a  round-off  noise,  e_0[n]  *  0 ,  with  a 

power  of  P  =  (2'13)2  /12  is  generated.  In  general,  if  R ,  the  number 
of  bits  in  the  fractional  part  of  the  fixed-point  representation,  is 
rounded  to  B  (B<R)  ,  then  the  round-off  noise  power  is  given 
by:  P,  =  2~2<s_11  / 12  [10].  If  there  are  M  such  rounding  noise 

sources  in  the  transposed  form,  the  total  noise  power  at  the  output  is 

M  M 

given  simply  by  their  sum:  PtariJ  =  ^PWl  =  /12 .  For  a 

1  *=1 

general  digital  filter,  the  \uh  noise  source  might  pass  through  a 

Lt 

transfer  function  with  z-transform  Hk( :)  =  ^h(n)z""  .  then  the 

M  2 

total  output  noise  power  is  PMeJ  =  £  Ptl  (0)f  ,  assuming  that  they 

are  uncoiTelated.  The  output  accuracy,  in  terms  of  the  number  of 
fractional  bits,  is  therefore  given  by  (1  /  6)  - 1 0  -  log10(/’„„, ) .  In 


general,  to  have  16-bit  output  accuracy,  the  output  noise-power  must 
be  below  -96  dB  level.  From  these  results,  we  can  see  that,  the 
larger  the  number  of  noise  sources,  the  lower  will  be  the  accuracy  of 
the  computation.  The  noise  power  can  however  be  reduced  by 
increasing  the  wordlength  for  the  fractional  bits,  at  the  expense  of 
increased  hardware  complexity. 

2.  Preventing  Overflow 

Another  important  source  of  error  is  signal  overflow  [10], 
which  occurs  when  the  allocated  wordlength  in  the  integer  part  is 
insufficient  to  represent  correctly  the  fixed-point  representation  of 
the  output  after  addition  (such  as  the  adders  in  Fig.  2).  In  order  to 
avoid  overflow,  we  must  allocate  more  bits  to  the  integer  part  of  the 
register  (say  D  in  Fig.  2).  We  are  given  the  option  to  retain  or 
decrease  the  number  of  bits  in  the  fractional  part,  depending  on  the 
required  accuracy.  To  determine  whether  overflow  will  occur  for  a 
given  adder,  we  can  compute  certain  measures  of  the  transfer 
function  from  the  input  to  this  particular  adder.  Here,  we  prefer  a 
more  conservative  measure  using  the  absolute  sum  of  the  impulse 
response,  i.e.  LI  scaling.  For  example,  let  be  the  maximum 

L 

input  to  a  FIR  transposed  form  digital  filter  H{:)  -  £ h(k)z~k  as 

AeO 

shown  Fig.  2.  Then  the  maximum  (or  worse  case)  value  at  the 
output  of  the  fh  adders  of  the  FIR  filter  is 

^|A()t)|jxm„,/  =  0 ,...L  .  From  these  values,  it  is  possible  to 

determine  the  required  integer  wordlength  at  each  position  to  avoid 
any  overflow.  The  number  of  fractional  bits  will  be  optimized  to 
satisfy  the  given  output  accuracy.  It  should  be  noted  that  there  are 
other  scaling  method  such  as  L2  scaling  which  can  also  be  used. 
However,  there  is  still  a  small  probability  that  overflow  will  occur. 
In  digital  signal  processor,  special  hardware  is  usually  used  to  detect 
the  present  of  overflow  and  the  result  will  be  clipped  to  the 
maximum/minimum  values  of  the  representation  (saturation 
arithmetic). 

IV.  The  Design  algorithm 

Our  design  method  consists  of  two  parts.  First,  the  parameters  of  the 
filters  a(:)  and  fi(z')  such  as  their  coefficients  and  their  order 
(parameters  N  and  M)  are  determined  from  the  frequency 
specification  (system  delay,  stopband  attenuation,  cutoff 
frequencies)  using  the  method  in  [4],  Then,  the  SOPOT  coefficients 
are  determined  using  a  random  search  algorithm  to  generate  the  MB 
(see  1  below).  The  hardware  complexity  of  the  FBs  are  then 
minimized  while  maintaining  the  output  accuracy  using  the  noise 
models  mentioned  earlier  (see  2  below). 

1 .  Search  for  the  SOPOT filler  coefficients. 

The  optimization  procedure  consists  of  two  stages.  First  a 
random  search  algorithm,  to  be  discussed  in  the  sequel,  is  used  to 
search  for  the  SOPOT  coefficients  of  a(z)  and  fi(z)  such  that  a 
given  performance  measure  is  minimized.  Then,  the  minimum 
number  of  adders  needed  in  the  multiplier  block  is  determined  The 
generation  of  the  multiplier-block  from  the  SOPOT  coefficients 
follows  the  algorithms  proposed  in  [9].  Let  x,  be  the  vector 
containing  the  real-valued  coefficients  of  a(z)  and  fi(z)  obtained 
by  the  method  in  [4],  The  principle  of  the  random  search  algorithm 
is  to  generate  random  candidate  SOPOT  coefficients  in  the 
neighborhood  of  x,  so  as  to  search  for  the  optimal  discrete 
solution.  More  precisely,  a  new  coefficient  vector  xiV£B.  is 
generated  by  adding  to  it  a  random  vector  to  the  original  coefficient 
vector  x,  to  form  xv£B.  =  [x,  +a-iR\0P0T ,  where  a  is  a  scale 
factor  which  controls  the  size  of  the  neighborhood  to  be  searched 
xR  is  a  vector  with  its  elements  being  random  numbers  in  the  range 

[-1,1] .  and  [1 soror  >s  rounding  operator  which  converts  its 
argument  to  the  nearest  SOPOT  coefficients  with  maximum  number 
of  terms  in  each  coefficient  being  L  and  dynamic  range  lv  and  / L . 
The  following  objective  function,  which  is  the  minimax  error 
between  the  desired  frequency  response  Hj(,e'")  and  the 
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frequency  response  H(eJ"\x)  calculated  using  the  candidate  i  in 
the  frequency  band  of  interest  CO  es  ,  is  minimized: 

score  =  max  ,  i)  -  Hd  (e‘“  )|) .  ( 1 ) 

The  process  is  repeated  with  different  vector  i  so  that  the  SOPOT 
space  in  the  neighborhood  of  i  is  sampled  randomly.  Since  the 
sampled  solutions  are  close  to  the  real-valued  optimal  solution,  their 
frequency  responses  will  also  be  close  to  the  ideal  one,  but  with 
different  hardware  complexity.  The  set  that  yields  the  minimum 
score  with  a  given  number  of  terms  is  recorded  As  this  is  a  random 
search  algorithm,  the  longer  the  searching  time,  the  higher  the 
chance  of  finding  the  optimal  solution, 

2.  Minimization  of  the  filter  banks  hardware  structures  with 
prescribed  output  accuracy 

After  the  MB  is  generated,  the  maximum  wordlength  of  all 
the  products,  x[n]h[i],  i= 0,. m  Fig.  2,  is  calculated.  If  we  do  not 
perform  any  rounding  using  the  operator  Q{.}  ,  and  sufficient 
wordlength  is  allocated  to  all  adders,  then  there  is  no  rounding  error. 
Of  course,  this  will  require  excessive  hardware  cost,  especially 
when  the  output  accuracy  is  low.  Our  goal  is  to  determine  the 
format  of  the  rounded  signals,  Q{x[n]h[i]},  i=0,...fi,  to  satisfy  the 
output  accuracy.  Suppose  that  the  formats  are  stored  in  a  vector  8  . 
Given  the  rounded  output  format  of  the  MB,  8  ,  one  can  determine, 
using  the  method  described  in  Section  ffl.2,  the  formats  of  the 
registers,  D's,  and  the  structure  of  the  adders,  in  order  to  avoid  any 
overflow.  The  fractional  part  for  those  scaled  output,  to  prevent 
overflow,  can  either  retain  its  wordlengh  or  reduce  it  by  one  as 
mentioned  in  Section  m.2.  This  option  is  stored  in  a  vector  8, ,  to 
be  optimized  together  with  8  .  The  noise  power  at  the  filter  output 
of  the  filter  is  readily  computed  accordingly  to  the  analysis 
described  in  Section  m.  1.  Note  the  output  noise  power  from 
a{z)  and  fi(z)  will  be  evaluated  and  their  contributions  at  the 
lowpass  (and  highpass)  analysis  filters  will  be  properly  summed, 
using  their  respective  power  transfer  functions  mentioned  in  Section 
in.l.  Our  design  algorithm  seeks  to  lower  the  wordlength  of  each 
intermediate  data  and  hence  the  complexity  format  as  specified  in 
8  and  8f  to  minimize  the  hardware  cost.  Using  8  and  8, ,  the 
hardware  cost,  C,  given  by  the  adder  cells  in  the  MB  and  the 
subsequent  adders  in  Fig.  2  can  be  evaluated  In  summary,  the 
design  problem  is 

™ subject  to  PmJ  <  Plprc ,  (2) 

where  Pmd  is  the  output  noise  power  at  the  lowpass  and  highpass 
filters  and  Ptprc  is  the  specified  output  accuracy.  Using  a  random 
search  algorithm  similar  to  that  mentioned  in  Section  IV.  1,  the 
vector  (8,8 j )  is  searched  in  the  neighborhood  of  their  full 

precision  values  (8,8  }  )„  (that  is  no  rounding)  for  feasible 
solutions  that  satisfying  the  given  output  accuracy.  The  one  with 
the  minimum  hardware  cost  C(S, 8  , )  is  declared  as  the  solution  of 

this  problem.  There  are  several  advantages  of  this  algorithm  First 
of  all,  with  the  computational  power  of  nowadays  personal 
computer  (PC)  the  time  for  obtaining  high  quality  solutions  is 
manageable,  especially  when  an  initial  real-valued  solution  is 
available  by  some  means.  In  fact,  for  the  problem  considered  here, 
the  overall  design  time  is  less  than  10  minutes  using  a  Pentium-400 
PC  with  Matlab  5.3,  including  both  the  design  of  SOPOT 
coefficients,  generation  of  the  MB  and  the  internal  wordlength 
allocation.  Secondly,  it  is  applicable  to  problems  with  general 
objective  functions  probably  with  very  complicated  inequality 
constraints,  as  illustrated  in  this  work.  It  is  also  possible  to  combine 
the  search  with  the  MB  generation  processes  together  for  better 
performance  but  the  computational  time  will  be  greatly  increased 
We  now  present  a  few  design  examples. 

V.  Design  Examples 

5.1.  Two-channel  PR  FBs  with  fi(z)  and  a(z)  FIR  filters 

To  demonstrate  the  effectiveness  of  our  algorithm  for  solving  the 
complicated  design  problem  a  two-channel  FB  with  the  following 


frequency  specification  is  designed:  passband  and  stopband  cutoff 
frequencies  cop=  0.4tt,  and  W.--0  6-.  respectively;  stopband 
attenuation  is  39  dB,  system  delay  =  23.  From  the  design  procedure 
in  [4],  the  parameters  N  and  M  are  determined  to  be  3  and  8, 
respectively.  The  wordlength  of  the  input  is  8-bit  and  is  normalized 
to  be  less  than  1,  i.e.  in  <1|7>  format.  The  required  output  accuracy 
is  at  least  16-bit  for  fractional  part  without  overflow.  The  frequency 
response  of  the  final  SOPOT  FB  is  shown  in  figure  3,  and  the  details 
of  its  optimized  structure  are  summarized  in  table  1.  The  reduction 
of  the  number  of  adders  obtained  by  using  MBs  to  implement  fi(z) 
and  a(z)  is  around  50%.  It  can  also  be  observed  that  the  number  of 
adder  cells  is  significantly  reduced  by  27%  (compared  with  a  fixed 
wordlength  of  24  bits  using  MBs  to  satisfying  the  same  output 
accuracy)  using  the  proposed  random  search  algorithm  to  minimise 
the  necessary  internal  wordlength,  while  satisfying  the  prescribed 
output  accuracy  of  16-bit.  The  overall  design  takes  about  10  minutes 
on  a  typical  Pentium-533  computer. 

5.2.  Two-channel  PR  FB  with  fi(z)  HR  and  a(z)  FIR 

To  demonstrate  the  effectiveness  of  our  random  search  algorithm  in 
designing  SOPOT  PR  HR  FB,  p(z)  is  chosen  as  an  HR  filter  while 
a(z)  as  an  FIR  filter.  In  order  to  guarantee  the  stability  of  the  DDR 
filter,  the  denominator  of  fi(z)  is  factorized  as  a  lattice  structure 
and  the  magnitude  of  the  lattice  coefficients  are  forced  to  be  less 
than  1.  They  are  then  used  as  optimization  variable  in  the  random 
search  algorithm.  The  design  specifications  are:  passband  cutoff 
frequency  (Op  =0.4rt,  stopband  cutoff  frequency  &>,,  =0.6tt.  N  and  M 
are  determined  to  be  4  and  II,  respectively.  The  real-valued  filter 
coefficients  are  obtained  by  the  method  in  [11],  The  SOPOT 
coefficients  of  the  FBs  obtained  by  the  proposed  algorithm  are 
shown  in  table  2,  and  the  frequency  response  is  shown  in  figure  4. 
The  frequency  characteristic  is  very  good  despite  the  high 
nonlinearity  of  the  objective  function  for  the  DR  FBs.  From  our 
experience,  similar  results  cannot  be  achieved  by  GA  even  with 
design  time  several  orders  of  magnitude  longer.  The  latter  is  mainly 
due  to  high  sensitivities  of  the  poles.  Since  only  the  SOPOT 
coefficient  optimization  is  performed,  the  computation  time  is  much 
shorter,  only  6  minutes  in  this  case.  The  hardware  structure  is 
omitted  here  due  to  page  length  limitation. 

VI.  Conclusion 

A  novel  algorithm  for  the  design  and  hardware  reduction  of  a  class 
of  multiplier-less  two-channel  PR  FBs  using  SOPOT  is  presented.  It 
minimizes  a  more  realistic  hardware  cost,  such  as  adder  cells, 
subject  to  a  prescribe  output  accuracy  taking  into  account  rounding 
and  overflow  effects.  Further,  by  implementing  the  filters  in  the  FBs 
using  multiplier-block  (MB),  significant  overall  saving  in  hardware 
resources  can  be  achieved.  An  effective  random  search  algorithm 
is  also  proposed  to  solve  the  design  problem,  which  is  also 
applicable  to  PR  HR  FB  with  highly  nonlinear  objective  functions. 
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_ Overflow  Possibility _ 

_ Input  Format _ 

Output  Accuracy  (fractional  side)  -96.6d 

Output  Wordlength 

Number  of  Adders  in  the  MB 


_ Zero _ 

_ <1|7> _ 

-96.6dB  (accuracy  >  16-bit) 
23-bit 


Esimated  number  of  adder  cells  (with  j 

fixed  wordlength  of  24-bit  using  MBs)  _ _ 

Estimated  number  of  adder  cells  (with  g25  (saved  27%) 

optimized  wordlength  using  MBs)  _ _ 

Table  1.  Filter  banks  results  of  example  5.1,  PWL  :  product  word 
length.  Reg. :  register. 
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Fie  2.  Typical  digital  FIR  filters  with  round-off  noise  model. 
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Fig.  3.  Frequency  responses  of  the  two-channel  FB  in  example  5.1. 

H  (z)  &  H,(z)  Frequency  Response 


Fig.  4.  Frequency  responses  of  the  two-channel  FB  in  example  5.2 


Table  2.  Filter  banks  results  of  example  5.2. 
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ABSTRACT 

In  this  work  we  find  out  the  optimal  biorthogonal  filter  bank,  in  the 
ideal  case,  employing  the  proposed  minimization  of  quantization 
noise  amplification.  In  the  ideal  case  it  turns  out  to  be  a  parau¬ 
nitary  filter  bank  which  completely  decorrelates  the  input  signal, 
followed  by  scalar  DPCM  on  each  of  the  subbands.  Its  coding  gain 
is  shown  to  be  equal  to  that  of  the  ideal  scalar  DPCM  coder  acting 
on  the  original  signal,  irrespective  of  the  number  of  channels.  But 
the  previously  known  optimal  biorthogonal  filter  bank  attains  this 
coding  gain  only  when  the  number  of  channels  tends  to  infinity. 
The  coding  gain  advantage  of  the  new  optimal  filter  bank  struc¬ 
ture,  in  the  FIR  case,  is  verified.  The  minimization  of  quantization 
noise  amplification  is  also  used  to  maximize  the  coding  gain  of 
a  given  biorthogonal  filter  bank.  The  coding  gain  improvements 
are  verified  for  low  bit-rates  by  measuring  the  reconstruction  error 
introduced  in  coding  an  AR(1)  source. 

1.  INTRODUCTION 

The  coding  gain  of  a  biorthogonal  subband  coder  using  the  ad¬ 
ditive  uncorrelated  white  noise  model  for  the  quantizers  is  given 
by  the  equation  (1)  for  the  optimum  bit  allocation  case,  where  cr'l 
denotes  the  input  signal  variance,  a2x.  ’s  denote  the  subband  vari¬ 
ances,  Fi  (eJu  )  denotes  the  frequency  response  of  the  ith  synthesis 
filter  and  M  is  the  number  of  channels.  Figure  (1)  shows  the  block 
diagram  of  a  subband  coder  using  the  polyphase  representation  of 
the  filter  bank  [9]. 

CG= - - X  (1) 

The  denominator  of  the  coding  gain  expression  contains  terms 
equal  to  the  energy  of  the  synthesis  filters,  which  are  same  as  the 
squared  norm  of  the  synthesis  filters.  By  norm  of  an  FIR  filter, 
what  is  meant  is  the  magnitude  of  the  vector  whose  components 
are  the  filter  coefficients.  These  terms  represent  the  quantization 
noise  amplification  taking  place  in  the  filter  bank.  When  the  white 
quantization  noise  vector  passes  through  the  synthesis  polyphase 
matrix  R(z),  it  becomes  coloured  and  the  variance  is  amplified. 
Then  one  would  think  that  if  the  noise  psd  is  appropriately  modi¬ 
fied,  the  amplification  can  be  minimized  and  the  coding  gain  can 
be  maximized.  Modifying  the  psd  of  the  quantization  noise  vector 
can  be  done  by  passing  the  quantization  noise  through  a  colouring 
filter  which  is  a  multiple  input  multiple  output  filter.  Let  us  de¬ 
note  it  by  A(z).  This  quantization  noise  filtering  is  carried  out  by 
adding  an  appropriate  linear  combination  of  the  previously  known 
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quantization  noise  components  to  the  present  signal  component 
being  quantized. 

In  [8]  the  problem  of  optimal  paraunitary  filter  banks  is  solved 
and  a  construction  method  is  given  for  any  arbitrary  psd.  The  op¬ 
timal  biorthogonal  filter  bank  has  been  shown  to  be  the  optimal 
paraunitary  filter  bank  followed  by  a  scalar  filter  in  each  channel 
which  is  the  ideal  half  whitening  filter  for  that  channel  [6].  The 
half-whitening  arises  because  of  the  quantization  noise  amplifica¬ 
tion  occurring  in  a  biorthogonal  filter  bank. 


Fig.  1.  Subband  coder  using  a  PR  filter  bank. 


2.  OPTIMAL  BIORTHOGONAL  FILTER  BANK  WITH 
MINIMIZATION  OF  QUANTIZATION  NOISE 
AMPLIFICATION 

Figure  (2)  shows  how  the  quantization  noise  modification  is  done 
in  a  filter  bank  employing  gain  plus  additive  noise  model  for  quan¬ 
tizers.  D  is  a  diagonal  matrix  having  diagonal  elements  equal  to 
1/qi’s.  Note  that  the  additive  uncorrelated  white  noise  quantizer 
model  is  a  special  case  arising  when  ai’s  are  1.  Now  we  con¬ 
sider  what  would  be  the  optimal  biorthogonal  filter  banks  if  we  do 
the  modified  quantization.  The  additive  uncorrelated  white  noise 
model  is  assumed  for  the  quantizers.  Let  the  analysis  polyphase 
matrix  be  A(z)E(^),  where  E(z)  is  the  polyphase  matrix  corre¬ 
sponding  to  the  paraunitary  filter  bank  which  completely  decorre¬ 
lates  the  subband  signals.  Since  the  signals  are  completely  decor- 
related,  it  is  assumed  that  there  may  not  be  any  loss  of  generality 
in  restricting  A(z)  to  be  diagonal.  And  we  prove  it  in  the  sub¬ 
sequent  section  by  showing  that  this  structure  indeed  achieves  the 
maximum  coding  gain  possible,  that  is  the  gain  of  the  ideal  DPCM 
coder  [4].  Now  the  problem  is  to  find  out  the  optimum  colour¬ 
ing  filter  A (z).  To  avoid  delay-free  loops,  the  restriction  on  A (z) 
is  that  its  zeroth  order  coefficient  matrix  must  be  lower  triangular 
with  l’s  along  the  diagonal.  Considering  the  fact  that  the  parau¬ 
nitary  part  of  the  synthesis  filter  does  not  introduce  any  noise  am- 
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Fig.  2.  Performing  quantization  noise  filtering  in  a  PR  filter  bank 
employing  gain  plus  additive  noise  model  of  the  quantizer. 


plification,  it  is  seen  that  we  have  to  compensate  only  for  the  am¬ 
plification  by  the  diagonal  matrix  A-1  (z).  It  is  easily  seen  that, 
to  minimize  the  norms  of  the  filters  represented  by  the  columns 
of  A_1(2)A(z),  A (2)  also  needs  to  be  diagonal.  Let  A,(z)  be 
the  ith  diagonal  element  of  A(z).  Afizfs  are  scalar  polynomials 
with  1  as  the  constant  coefficient.  The  quantization  noise  variance 
introduced  at  the  output  by  the  channel  i  is  given  by. 


where  Gf  is  the  variance  of  the  output  of  the  ideal  predictor  when 
the  input  is  ,-r,  (71).  In  this  case  it  can  be  seen  that,  the  inequality 
(4)  is  satisfied  with  equality.  The  condition  for  a2.  achieving  its 
lower  bound  B  is. 


A,-(or 

C  ‘> 


-4,(r^)| 


(e^) 

which  simplifies  using  equation  (5)  to 


_  (k'Gi 


A,(0  = 


(6) 


(7) 


Thus,  neglecting  the  scaling,  the  optimum  A i(z)  as  well  as  the 
optimum  Aj(z)  is  the  whitening  filter  of  the  ith  subband  signal. 
A  scaling  on  the  A,(z)  will  not  affect  the  quantization  noise  as 
its  effect  is  undone  by  the  A~'(z).  The  coding  gain  expression 
depends  only  on  the  magnitude  of  A,(z)'s  and  .4,(z)’s  and  their 
phase  is  irrelevant.  So  by  choosing  A,  (z)  as  equal  to  the  optimum 
.4,  (.;),  it  can  be  ensured  that  stable  inverse  filter  exists,  if  A,(z)  is 
designed  to  be  optimal  minimum  phase  linear  predictor  [4]. 

Using  equations  (5)  and  (7),  it  can  be  shown  that  the  output 
noise  variance,  with  optimum  bit  allocation,  is 
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2 
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=  c2~21' 


(8) 


4  =  c2~2!i'  f*  S,Ti.rt(e3ul)  |A,(eJ")|2  ^ 

J  —  7T 

f  |-4,(^)Ar1(e^')  2  g  (2) 

where  ,r,(n)  is  the  i,h  subband  signal  at  the  output  of  E(s),  c  is 
the  constant  of  proportionality  of  the  quantizers  and  A,-  (a)  is  the 
ith  diagonal  entry  of  A (z).  Using  Cauchy-Schwarz  inequality,  a2. 
is  lower  bounded  by  B  where 


Due  to  uncorrelated  assumption,  the  total  output  noise  variance  is 
the  average  of  a2,  from  each  channel.  Therefore,  to  minimize  the 
total  output  noise,  each  a2,  is  to  be  minimized.  Applying  Cauchy- 
Schwarz  inequality  to  B ,  we  get 

[’  Sri:ri(eju)\A,(e^)\2^  (4) 

J  —TT  n 

Thus,  to  minimize  B ,  it  is  enough  to  minimize  the  right  hand  side 
of  equation  (4).  Ai(z)  is  restricted  to  have  1  as  the  constant  co¬ 
efficient.  Such  an  Afz)  which  minimizes  the  right  hand  side  of 
equation  (4)  is  known  from  the  theory  of  linear  prediction  [4].  So 
for  any  order,  Ai(z)  is  the  optimal  linear  predictor  of  the  ith  sub¬ 
band  signal.  In  the  infinite  order  case  becomes  the  spectral 

factor  of  the  psd  of  the  ith  subband  and 

\Ai(en\  =  -  (5) 

5T%,.(eJu) 


where  h  is  the  average  bit-rate  and  ?/;(n)  is  the  ith  subband  signal 
at  the  output  of  A ,  (z).  Therefore,  the  noise  amplification  has  been 
completely  eliminated.  While  in  presence  of  noise  amplification, 
the  optimal  biorthogonal  filter  bank  only  achieves  half-whitening, 
the  proposed  optimal  biorthogonal  filter  bank  without  noise  am¬ 
plification  achieves  full  whitening.  Therefore,  we  name  it  the  full¬ 
whitening  (FW)  filter  bank. 

The  FW  structure  can  be  used  for  finite  order  as  well.  Though 
no  finite  order  paraunitary  filter  bank  can  completely  decorrelate 
the  subbands,  given  a  paraunitary  filter  bank,  its  coding  gain  can 
be  increased  using  this  structure.  Since  A; (2)  =  A,{z),  the  filter¬ 
ing  by  A,-(z)  and  the  modification  of  the  quantization  noise  can  be 
more  easily  done,  using  the  scalar  DPCM  structure  on  each  chan¬ 
nel.  It  can  be  shown  that  coding  gain  improvement  is  ensured. 

In  the  case  of  ideal  optimum  linear  predictor  it  has  been  shown 

that 

2  2  2  ,n\ 

tP  y ,  !x  j  tty  ■  (9) 

where  -y2.  is  the  spectral  flatness  measure  for  Srix,  {e]~)  [4],  It 
has  also  been  shown  that 

4  =  ^P(/  l,,!k  ('8 4 ,  j-;  (c1'1’ )]  (10) 

In  the  following  theorem  we  show  that  the  M  channel  optimal  FW 
filter  bank,  for  any  M,  achieves  the  highest  possible  coding  gain 
for  any  filter  bank. 

Theorem  1  The  optimal  FW  filter  bank  introduced  above  attains 
the  gain  of  the  ideal  DPCM  coder. 

Proof:  The  coding  gain  of  the  ideal  DPCM  coder  is  given  by 

CGdpcm  =  - 7 - - - c-  (11) 

exp  (j'4  logc  [S*,(eJ")] 
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Using  equations  (10)  and  (8)  we  get. 
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U  X 

OCtFVT  =  - — 

(nf'o1  exP  (f*n  l°9e  [5I;^(e^)]  M 

(12) 

So  we  have  to  show  that  both  the  denominators  are  equal.  If  Df  ir¬ 
is  the  denominator  in  equation  (12),  then 

Dfw  =  exP\Jp  [  l°9e  t  S*ixi(e3U)]  ^  ) 

\  O^KM-r-'  / 

(13) 

Since  E(z)  completely  decorrelates  the  subbands,  the  correspond¬ 
ing  subband  filter  responses  are  non-overlapping,  because  the  psd 
of  x(n)  is  assumed  to  be  non-zero  everywhere.  Being  paraunitary, 
the  filters  have  flat  top.  So  it  can  be  seen  that  the  ith  subband 
filter,  say  Hi(z )  selects  a  segment,  not  necessarily  contiguous,  of 
MSXx(ej“)  of  total  width  where  Sxx(ej“)  is  the  psd  of  the 
original  scalar  signal  x(n).  The  decimation  operation  causes  its 
M — 1  images,  each  shifted  by  an  amount  'jj  to  be  added  to  it,  then 
a  stretching  by  a  factor  of  M  and  a  scaling  by  The  Nyquist-M 
property  of  the  filter  ensures  that  no  two  of  the  images  overlap. 
So  it  can  be  seen  that  each  SX{Xi  (eJ“)  is  obtained  by  taking  the 
sections  of  the  original  psd  selected  by  Hi(z)’ s,  stretching  by  a 
factor  of  M  and  reordering.  Thus  the  area  under  any  function  of 
SXiXi{ejbJ)  is  the  same  as  M  times  the  area  of  the  same  function  of 
Sxx(eJUJ)  over  the  support  region  of  Hi(z).  The  factor  M  comes 
because  of  the  stretching.  So, 


Fig.  3.  Response  of  the  low  pass  analysis  filters  used  in  the  simu¬ 
lation.  Bold  curve  is  for  the  FW  filter  bank,  the  dash-dotted  curve 
is  for  the  half  whitening  filter  bank  and  the  dotted  curve  is  for  the 
paraunitary  filter  bank. 


gain  CGi  obtained,  relative  to  the  case  where  no  colouring  filter 
is  used  is  given  by  equation  ( 15), 


CGI 

CG 


'n; 


M—l 

0 


rf ifi 


itf 


nM-i 

0  Li  Li 


(15) 


Dfw  =  exp  ^  loge  [Sxx(ejuJ)]  (14) 

This  is  because,  the  support  of  Hi(z)'s  are  disjoint  and  together 
they  fill  the  entire  (— zr,  zr).  Thus,  CGdpcm  =  CGfw  and  the 
theorem  is  proved.  [] 


where  f,  is  the  vector  representing  the  ith  synthesis  filter  impulse 
response,  and  {■  is  the  vector  representing  the  residual  after  projec¬ 
tion  of  ~rrf;.  This  coding  gain  improvement  can  be  shown  [1]  to 
be  independent  of  the  order  of  performing  the  quantization  among 
the  subband  signals. 


3.  OPTIMUM  COLOURING  FILTER  FOR  A  GIVEN 
BIORTHOGONAL  FILTER  BANK 

For  a  finite  order  biorthogonal  filter  bank  the  colouring  filter  is 
not  diagonal.  In  this  part  the  optimum  implementable  colouring 
filter  such  that  the  quantization  noise  amplification  is  minimized  is 
found  out,  for  a  given  filter  bank. 

The  coding  gain  given  by  the  equation  (1)  assumes  the  additive 
uncorrelated  white  noise  model  for  the  quantizers.  The  derivation 
of  the  optimum  A(«)  is  done  using  the  more  general  gain  plus 
additive  noise  model  for  the  quantizers,  which  is  more  suitable 
at  low  bit-rates  [4],  To  avoid  delay-free  loops,  A (z)  should  be 
causal  and  its  zeroth  order  coefficient  matrix  Ao  should  be  upper 
triangular  or  its  permutations.  In  the  absence  of  A(z),  the  out¬ 
put  noise  vector  corrupting  the  blocked  reconstructed  signal  vec¬ 
tor  will  be  r(n)  multiplied  by  ^  s  and  filtered  through  R(z),  the 
synthesis  polyphase  matrix.  With  A(z),  it  will  be  r (n)  filtered 
through  R(z)A(2),  where  the  diagonal  elements  of  the  constant 
coefficient  of  A(z)  are  ( -f  - ,  -f  - , . . . ,  —2 — ).  It  can  be  shown  that 

[1]  the  optimum  A (z)  having  the  required  properties,  which  min¬ 
imizes  the  recon;  r  uction  error  variance  is  obtained  from  the  pro¬ 
jectors  which  project  each  synthesis  filter  onto  the  space  spanned 
by  the  filters  above  it  and  all  the  synthesis  filters  delayed  by  mul¬ 
tiples  of  M  samples.  The  resulting  A (z)  is  FIR  and  the  coding 


4.  SIMULATION  RESULTS 

In  the  following  table  the  coding  gain  of  different  schemes  are 
compared  for  2  channel  case  for  an  AR(1)  source  with  correla¬ 
tion  coefficient  p  =  0.95.  While  column  2  gives  the  idead  coding 
gain,  the  coding  gain  for  finite  order  case  is  given  in  column  3. 
The  deatils  are:  paraunitary  filter  bank  of  order  1 1,  half-whitening 
filter  bank  using  the  same  paraunitary  filters  cascaded  with  half¬ 
whitening  filters  of  order  3,  and  full-whitening  filter  bank  same  as 
half-whitening  case  with  coloring  filters  Aq(z)  and  At  (z)  of  order 
3. 


scheme 

1  coding  gain  j 

ideal 

FIR 

DPCM 

10.11  dB 

paraunitary  [8] 

5.96  dB 

5.56  dB 

half-whitening  [6] 

8.16  dB 

7.75  dB 

full-whitening 

10.11  dB 

9.78  dB 

The  figure  (3)  shows  the  low  pass  analysis  filter  responses  of 
the  half-whitening,  FW,  and  the  paraunitary  filter  bank  used  for 
the  simulation.  The  dip  of  the  low  pass  analysis  filter  bank  near 
oj  =  0  clearly  shows  the  whitening  action  of  the  filter  bank.  The 
full  whitening  solution  has  almost  twice  as  much  dip  in  dB  as  the 
half  whitening  solution. 
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Fig.  4.  SNR  versus  bit-rate  curves  for  actual  quantization  of  an 
AR-1  source  having  p= 0.95.  Bold  curve  is  for  the  FW  filter  bank, 
dashed  curve  is  for  the  half  whitening  filter  bank,  dash-dotted 
curve  is  for  the  paraunitary  filter  bank  and  dotted  curve  is  for  the 
scalar  PCM. 

The  actual  coding  gain  improvement  may  vary  from  the  theo¬ 
retical  ones,  especially  at  low  bit  rates  because  the  quantizer  model 
becomes  less  accurate  and  we  used  the  high-resolution  quantiza¬ 
tion  assumption  that  the  variance  of  signal  plus  the  quantization 
noise  filtered  through  the  colouring  filter  is  not  much  different 
from  the  variance  of  the  original  signal.  Simulations  of  actual 
quantization  carried  out  proves  that  good  gain  is  obtained  even 
at  very  low  bit  rates.  The  variation  of  SNR  with  bit-rate  is  given 
in  figure  (4).  Coding  gain  improvement  for  a  given  filter  bank 
is  illustrated  by  simulations  carried  out  on  the  same  AR  source. 
The  table  given  below  gives  the  theoretical  coding  gain  improve¬ 
ment  obtained  for  several  standard  two-channel  biorthogonal  filter 
banks,  along  with  the  measure  of  non-orthogonality  introduced  in 
Lightstone  et  al  [3], 


Filter 

Relative 

Measure  of 

Bank 

Gain  (dB) 

non-orthogonality 

Egger-Li  4-12  [5] 

0.5195 

1.2874 

Legall  3-5  [2] 

0.1633 

0.3887 

Moulin  1-3  [7] 

0.8805 

2.9671 

Moulin  5-1 1  [7] 

1.2689 

2.5587 

The  actual  relative  gain  values  agree  with  the  theoretical  gains  even 
at  low  bit  rates.  We  also  consider  the  variation  of  the  improvement 
in  coding  gain  with  the  order  of  the  colouring  filter  A(z).  Though 
the  optimum  A(z)  is  FIR,  a  lower  order  A(z)  may  be  preferred 
because  of  complexity  considerations.  It  is  seen  that  largest  in¬ 
crease  in  the  relative  gain  comes  when  the  order  is  increased  from 
0  to  1.  Moreover,  good  improvement  is  obtained  even  with  first 
order  A(z). 

5.  CONCLUSION 

In  this  work  we  proposed  the  optimal  biorthogonal  filter  bank  with 
minimization  of  quantization  noise  amplification.  In  the  ideal  case 
this  led  to  a  filter  bank  which  attained  the  maximum  possible  cod¬ 
ing  gain,  namely,  the  coding  gain  of  the  ideal  DPCM  coder.  On 
the  contrary,  the  previously  known  optimal  biorthogonal  filter  bank 
achieved  this  bound  only  when  the  number  of  channels  tends  to  in¬ 
finity.  Given  a  biorthogonal  bank,  the  coding  gain  was  optimized 


by  minimizing  the  quantization  noise  amplification  by  modifying 
the  quantization  noise.  By  doing  simulation  of  the  actual  quan¬ 
tization  on  an  AR(  1 )  source  using  finite  order  filters,  we  showed 
that  very  good  coding  gain  advantage  is  obtained  for  the  optimal 
biorthogonal  filter  bank  structure  in  the  finite  order  case  even  at 
very  low  bit-rates. 
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ABSTRACT 

Wavelet  packets  have  been  found  a  promising  candidate  for  user 
signature  waveforms  in  code  division  multiple  access  commu¬ 
nication  systems.  The  waveforms  are  usually  chosen  from  an 
orthonormal  basis  so  that  there  will  be  no  interference  between 
different  users.  However,  timing  errors  may  cause  these  sig¬ 
nature  waveforms  to  lose  orthogonality  to  each  other.  In  this 
paper,  we  describe  a  signal  set  which  utilizes  double  orthogo¬ 
nality  based  on  wavelet  packets  and  binary  Walsh  codes.  This 
double  orthogonality  produces  auto-  and  cross-correlations  that 
are  much  better  than  the  conventional  wavelet  packet  sets  and 
are  comparable  to  the  pseudo  random  binary  codes.  The  dou¬ 
ble  orthogonality  may  also  enable  easy  implementation  in  low 
complexity  receiver  design. 

Key  Words  Multirate  Processing,  Wavelets,  CDMA 

1.  INTRODUCTION 

Wavelet  Packets  have  properties  that  make  them  a  good  candi¬ 
date  for  spreading  codes  in  a  Code  Division  Multiple  Access 
(CDMA)  system.  By  arbitrary  prunning  of  a  binary  wavelet 
packet  construction  tree,  an  orthonormal  and  complete  wavelet 
packet  basis  set  can  be  constructed  effectively.  This  provides 
perfect  spreading  codes  that  have  zero  cross-correlations,  thereby 
eliminating  multiple  access  interference  in  the  absence  of  syn¬ 
chronization  error.  Wavelet  packet  based  methods  also  have  the 
advantage  of  naturally  enabling  multirate  communication,  and 
much  work  was  devoted  to  user  signature  waveform  designs  us¬ 
ing  wavelet  packets  for  improving  cross-correlation  properties 
over  pseudo  random  codes  [1],  Learned  et  al  [2]  not  only  used 
wavelet  packets  as  user  signature  waveforms  but  also  designed 
an  optimal  joint  detector  which  achieved  a  lower  complexity 
compared  with  conventional  CDMA  optimal  receiver  designs. 
Lindsey  [3]  found  that  by  carefully  choosing  the  wavelet  packet 
basis,  a  wider  selection  of  time-frequency  tilings  of  wavelet  pack¬ 
ets  could  achieve  a  much  better  match  of  the  transmission  signal 
with  the  channel.  Based  on  this  observation,  a  method  called 
wavelet  packet  modulation  (WPM)  was  proposed  and  proved 
to  have  significant  improvement  of  communication  performance 
over  Quadrature  Amplitude  Modulation  (QAM)  [9],  All  of  these 
works  assume  perfect  timing  of  the  signature  waveforms. 

The  approaches  in  the  literature  using  wavelet  packets  as 
spreading  user  waveforms  differ  from  the  conventional  CDMA 

This  work  is  supported  in  part  by  AFRL/IFKD  under  Contract  F30602-00- 
C-0086  through  GIRD  Systems,  Inc. 


systems  in  the  following  ways.  First,  they  are  more  like  FDMA 
or  TDMA  systems.  Each  user  mainly  occupies  relatively  a  small 
portion  of  the  available  bandwidth,  or  transmit  mainly  in  a  small 
portion  of  the  symbol  duration.  Thus,  compared  with  the  con¬ 
ventional  CDMA  system,  this  kind  of  multiple  access  may  suffer 
from  narrow  band  or  impulsive  interference  if  there  is  no  infor¬ 
mation  about  the  interference  available  at  hand.  In  addition,  the 
narrow  band  waveforms  tend  to  behave  like  periodic  functions, 
i.e.,  their  autocorrelations  have  more  than  one  peak.  This  makes 
the  synchronization  task  difficult  in  the  receiver.  Since  the  wave¬ 
form  set  is  generated  from  the  nodes  of  the  lowest  level  (begin¬ 
ning  with  a  length  1  signal)  and  some  higher  levels  of  a  binary 
wavelet  packet  tree,  some  of  the  waveforms  are  simply  shifted 
versions  of  one  another.  Thus,  this  approach  requires  good,  if 
not  perfect,  synchronization. 

However,  timing  error  cannot  be  ignored  in  some  cases,  such 
as  in  reverse  link  communication  in  a  cellular  system.  Wong  et 
al  (4]  investigated  the  timing  error  effect  and  derived  an  algo¬ 
rithm  to  optimize  the  wavelet  packet  design  in  a  wavelet  packet 
division  multiplexing  system.  It  has  been  shown  that  a  lower 
error  probability  than  commonly  used  wavelet  packets  can  be 
achieved  using  the  optimum  design.  Hetling  et  al  [1]  [5]  in¬ 
vestigated  the  possible  interference  from  another  user  waveform 
for  asynchronous  communication  channel.  Sesay  et  al  [6]  also 
investigated  the  multiuser  interference  from  the  perspective  of 
auto-  and  cross-correlation  functions  and  error  probability  in  a 
waveform  division  multiple  access  system.  Other  researchers 
also  discussed  the  interference  in  such  a  spread  spectrum  system 
[7]-[8].  Most  of  these  works  propose  alternative  wavelet  packet 
filter  designs  to  reduce  the  multiuser  interference. 

In  this  paper,  we  describe  and  investigate  a  doubly  orthog¬ 
onal  wavelet  packet  set,  which  utilizes  wavelet  packets  and  bi¬ 
nary  Walsh  codes.  This  double  orthogonality  produces  much 
better  auto-  and  cross-correlations  than  the  conventional  wavelet 
packet  sets  and  may  also  enable  low  complexity  receiver  de¬ 
sign.  Computer  simulation  results  confirm  the  effectiveness  of 
this  new  waveform  design. 

2,  DOUBLY  ORTHOGONAL  WAVELET  PACKET  SET 

We  propose  a  set  of  user  waveforms  as  a  candidate  of  spread¬ 
ing  codes  for  a  CDMA  system.  The  code  waveforms  should 
have  good  autocorrelation  and  cross-correlation  properties,  and 
the  autocorrelation  functions  should  have  only  one  narrow  peak. 
This  ensures  the  initial  acquisition  and  the  following  tracking 
of  synchronization.  The  cross-correlations  between  any  pair  of 
waveforms  in  the  set  should  be  small  enough  so  that  the  mul- 
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tiple  access  interference  due  to  other  users  can  be  maintained 
at  minimum.  The  proposed  doubly  orthogonal  wavelet  packet 
waveforms  have  the  desired  correlation  properties. 

R 


Figure  1 :  Binary  wavelet  packet  tree  structure 
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Wavelet  packet  waveforms  are  generated  by  up-sampling  and 
filtering  impulses  from  certain  nodes  of  the  binary  wavelet  packet 
tree.  Figure  1  shows  the  binary  wavelet  packet  tree  structure. 
To  generate  wavelet  packet  waveforms,  we  begin  from  a  certain 
node  and  go  up  to  the  root  of  the  tree  by  up-sampling  and  filter¬ 
ing  an  impulse  signal.  The  level  and  position  of  the  node  deter¬ 
mine  how  many  times  of  the  up-sampling  and  filtering  process 
are  taken  and  types  of  the  filter,  usually  a  low-pass  or  high-pass 
quadrature  mirror  filter.  From  eight  level-3  nodes  of  the  tree,  we 
can  generate  eight  wavelet  packet  waveforms.  The  length  of  the 
generated  waveforms  is  determined  by  the  length  of  the  input 
impulses.  The  shortest  waveform  which  can  be  generated  from 
a  level-3  node  is  length  8,  if  the  input  impulse  has  a  length  of 
1.  However,  the  filtering  process  will  make  the  generated  wave¬ 
forms  to  fold  several  times.  Higher  filter  level  results  in  more 
folding.  As  a  consequence,  some  of  the  waveforms  from  differ¬ 
ent  nodes  become  just  shifted  versions  of  each  other,  and  may 
not  be  used  in  an  asynchronous  system. 

Now  consider  the  proposed  doubly  orthogonal  wavelet  packet 
waveforms.  For  simplicity,  we  consider  a  64  user  CDMA  sys¬ 
tem.  Instead  of  generating  all  64  waveforms  from  64  level-6 
nodes,  we  divide  the  users  into  8  groups,  named  A  through  H, 
each  containing  eight  users.  We  also  chop  each  symbol  into  8 
chips  in  the  time  domain.  Each  group  of  8  users  associates  with 
one  length  8  Walsh  code  as  the  chip  code  for  each  symbol  inter¬ 
val.  Due  to  the  orthogonality  of  Walsh  code,  these  8  groups  of 
users  have  waveforms  orthogonal  to  each  other.  The  8  users  in 
one  group  are  assigned  orthogonal  waveforms  based  on  wavelet 
packets.  If  we  generate  the  wavelet  packet  waveforms  from  all 
the  eight  level-3  nodes  of  a  wavelet  packet  tree,  we  can  generate 
8  orthogonal  wavelet  packet  waveforms.  We  name  these  wavelet 
packet  waveforms  from  1  to  8.  The  proposed  doubly  orthogonal 
waveforms  are  thus  generated  by  mapping  the  8  wavelet  packet 
waveforms  to  each  of  the  8  chips  of  the  Walsh  code.  Eight  differ¬ 
ent  ordering  possibilities  of  the  mapping  enable  us  to  fit  8  users 
in  one  Walsh  code.  Thus  totally  we  can  generate  64  different 
user  waveforms.  The  mapping  is  simply  done  by  multiplying 
the  wavelet  packet  waveforms  with  the  Walsh  code  chip  value, 
i.e.,  1  or  -1. 

Figure  2  is  an  example  of  the  mapping  matrix  which  defines 
the  8  different  orders  of  mapping  wavelet  packet  waveforms  to 
Walsh  code  chips  of  all  l's,  i.e.,  user  group  A.  The  numbers 
shown  in  the  8  x  8  matrix  is  the  wavelet  packet  waveform  num¬ 
bers  (1  to  8).  Each  column  in  the  mapping  matrix  corresponds 
to  one  Walsh  code  chip,  or  one  time  slot.  Each  row  in  the  matrix 
corresponds  to  one  possible  order  of  mapping  8  wavelet  packet 
waveforms  to  8  chips.  This  defines  a  unique  user  waveform. 
Eight  rows  define  eight  user  waveforms  in  one  user  group.  For 
example,  the  third  row  specifies  that  the  waveform  for  user  3  in 
the  group  is  generated  by  concatenating  wavelet  packet  wave¬ 


forms  3,  7,  4,  5,  8,  2,  6,  and  1.  This  particular  order  makes  the 
waveform  unique.  Other  users  have  different  mapping  orders  so 
that  the  generated  waveforms  differ  from  that  of  user  3.  Note 
that  in  each  time  slot  the  8  users  have  been  mapped  with  8  dif¬ 
ferent  wavelet  packet  waveforms.  This  ensures  that  the  8  user 
waveforms  in  the  same  user  group  are  orthogonal  to  each  other. 
Since  all  of  the  rows  define  mappings  of  all  the  8  wavelet  packet 
waveforms  to  the  Walsh  chips,  all  user  waveforms  will  occupy 
the  entire  frequency  bandwidth  as  well  as  all  the  time  slots. 

Since  different  users  occupy  distinct  wavelet  packet  wave¬ 
forms  in  any  of  the  time  slots,  it  is  desirable  to  represent  the  sig¬ 
nal  using  permutation  notations.  Using  the  above  example,  the 
eight  rows  or  eight  columns  in  the  mapping  matrix  are  different 
permutations  of  Xg  =  {1, 2, 3, 4, 5, 6,  7, 8}.  Note  that  absence 
of  repetition  of  the  8  wavelet  packet  waveforms  in  each  column 
is  important  to  ensure  orthogonality,  whereas  such  absence  in 
each  row  is  not  essential,  although  desirable.  The  constructed 
signal  set  of  group  A  users  using  the  above  example  is 

8 

Wn)  =  £/W>-  8(t  -  1))  k  =  1 . 8  (1) 

i=l 

where  <Ji{k)(i  =  1,2,... ,8)  are  eight  permutations  of  AT8  for 
the  k th  user,  and  P3,((Z  =  1,2,...,  8)  are  eight  wavelet  packet 
waveforms  each  with  length  8. 

For  user  Group  B,  the  Walsh  code  is,  e.g.,  1, 1,1, 1,-1, -1,-1,- 
1.  Then  the  mapping  matrix  would  be  the  same  as  Figure  2, 
except  that  in  the  last  four  columns  all  wavelet  packet  waveforms 
need  to  be  multiplied  by  —1.  Other  user  groups  follow  Figure  2 
and  the  corresponding  Walsh  codes  in  a  similar  way.  Thus  the 
constructed  signal  set  of  the  kth  user  in  the  j th  group  is 

8 

»i,k (n)  =  Y,  Wj  (0^3, „!(*)(«  -  8 (i  -  1)) 

i=i 

j  =  A,B,...,H,k  =  1,2,..., 8  (2) 

where  Wj(i),(i  =  1,2,..., 8)  is  the  j  th  length  8  Walsh  code. 
Since  the  Walsh  codes  form  an  orthogonal  basis,  user  waveforms 
with  same  wavelet  packet  mapping  orders  but  in  different  groups 
are  also  orthogonal  to  each  other.  For  example,  user  1  in  group 
A  and  B  have  same  wavelet  packet  mapping  order,  but  because 
the  wavelet  packet  waveforms  are  multiplied  by  two  orthogonal 
Walsh  codes,  these  two  waveforms  are  orthogonal  to  each  other. 
It  is  easy  to  see  that  user  waveforms  in  different  groups  and  with 
different  mapping  orders  are  also  orthogonal  to  each  other.  Thus, 
the  64  user  waveforms  form  an  orthogonal  set. 

This  algorithm  can  be  generalized  to  achieve  a  tradeoff  be¬ 
tween  the  autocorrelation  and  cross-correlation  properties  of  the 
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waveforms.  If  the  desired  length  of  the  waveforms  is  IV  =  2J+fc, 
we  can  divide  the  users  into  M  =  V  groups,  and  generate  length 
of  L  =  2k  wavelet  packet  waveforms.  An  orthogonal  waveform 
set  can  be  formed  by  combining  the  wavelet  packet  waveforms 
according  to  the  above  algorithm.  The  number  of  waveforms  in 
the  set  is  N  =  M  x  L.  A  tradeoff  between  the  autocorrelation 
and  cross-correlation  properties  can  be  achieved  with  different 
combinations  of  M  and  L.  In  general,  a  smaller  M  and  a  larger 
L  results  in  better  cross-correlation  but  poorer  autocorrelation, 
and  vice  versa. 

3.  CORRELATION  PROPERTIES  OF  THE 
WAVEFORMS 

Now,  we  investigate  the  autocorrelation  and  cross-correlation 
properties  of  the  waveforms  proposed  in  the  last  section.  As 
an  example,  we  choose  the  Daubechies  4  wavelet  as  the  mother 
wavelet  from  which  a  wavelet  packet  tree  is  constructed.  The 
reason  is  that  the  order  of  the  filter  is  lower  than  other  wavelets 
because  Daubechies  wavelets  have  minimum  size  of  support. 
The  correlation  functions  we  are  to  investigate  are  discrete  pe¬ 
riodic  auto-  and  cross-correlation  functions  defined  as 

1  JV_1 

Ri(k)  =  Si(n)si(n  +  k)  (3) 

n= 0 

and 

i  N~i 

Cijik)  =  jy  si(n)sj(n  +  k)  (4) 

n= 0 

where  N  is  the  waveform  length.  We  have  also  investigated  the 
averaged  cross-correlation  functions  defined  as 

C‘(‘)  =  w^T  £  c«<*>  <5) 

j=l,j& 

Figure  3  gives  an  example  of  the  autocorrelation  function  of  a 
length  64  doubly  orthogonal  wavelet  packet  waveform.  We  can 
see  that  this  autocorrelation  has  a  single  narrow  peak.  This  is 
similar  to  the  autocorrelation  function  of  the  length  63  Gold  code 
given  in  Figure  4.  Figure  5  gives  an  example  of  the  autocorrela¬ 
tion  function  of  a  length  64  wavelet  packet  waveform,  which  is 
much  worse. 

Figure  6  gives  an  example  of  the  cross-correlation  function 
between  a  pair  of  doubly  orthogonal  wavelet  packet  waveforms. 
Note  that  the  cross-correlation  is  zero  when  the  relative  shift  of 
the  two  waveforms  is  zero.  Compared  with  the  cross-correlation 
function  of  Gold  codes  given  in  Figure  7,  we  can  find  that  the 
cross-correlation  of  the  proposed  waveforms  is  in  the  same  level 
with  that  of  Gold  codes,  but  not  as  regularly  distributed.  For  the 
conventional  wavelet  packets,  the  cross-correlation  is  much  bet¬ 
ter  than  Gold  codes  on  the  average  [1],  However,  in  the  wavelet 
packet  set  many  waveforms  are  the  shifted  versions  of  one  an¬ 
other,  which  gives  poor  cross-correlation.  Figure  8  gives  such 
an  example  of  the  cross-correlation  function  between  a  pair  of 
length  64  wavelet  packet  waveforms.  Since  these  two  wave¬ 
forms  are  shifted  versions  of  each  other,  the  cross-correlation 
not  only  has  large  values  but  also  has  value  1  for  some  relative 
shifts.  This  is  not  the  case  for  the  doubly  orthogonal  wavelet 
packet  waveforms. 

Figure  9  gives  an  example  of  the  averaged  cross-correlation 
function  of  one  doubly  orthogonal  wavelet  packet  waveform. 
Compared  with  the  averaged  cross-correlation  function  of  Gold 


code  given  in  Figure  10,  we  can  find  the  proposed  waveforms  are 
at  a  similar  level.  Figure  1 1  gives  the  averaged  cross-correlation 
function  of  a  length  64  wavelet  packet  waveform.  We  can  see 
that  the  cross-correlation  of  the  doubly  orthogonal  wavelet  packet 
waveform  is  higher  than  that  of  a  conventional  wavelet  packet 
waveform.  However,  the  proposed  waveforms  do  not  have  any 
large  cross-correlation  values  as  the  conventional  wavelet  packet 
waveforms  in  Figure  8. 
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Figure  3:  Autocorrelation  of  a  DOWP  waveform 
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Figure  4:  Autocorrelation  of  a  length  63  Gold  code 


Figure  8:  Cross-correlation  between  a  pair  of  WP  waveforms 


Figure  7:  Cross-correlation  between  a  pair  of  Gold  codes 


Figure  1 1 :  Averaged  cross-correlation  of  a  WP  waveform 
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